diff --git a/docs/make.jl b/docs/make.jl
index b22ebab8..242a9819 100644
--- a/docs/make.jl
+++ b/docs/make.jl
@@ -6,14 +6,32 @@ makedocs(
 	sitename = "IterativeSolvers.jl",
 	pages = [
 		"Home" => "index.md",
-		"Manual" => "user_manual.md",
-			"Library" => [
-	        	"Public" => "library/public.md",
-	        	"Internal" => "library/internal.md",
-	    	],
+		"Getting started" => "getting_started.md",
+		"Preconditioning" => "preconditioning.md",
+		"Linear systems" => [
+			"Conjugate Gradients" => "linear_systems/cg.md",
+			"Chebyshev iteration" => "linear_systems/chebyshev.md",
+			"MINRES" => "linear_systems/minres.md",
+			"BiCGStab(l)" => "linear_systems/bicgstabl.md",
+			"IDR(s)" => "linear_systems/idrs.md",
+			"Restarted GMRES" => "linear_systems/gmres.md",
+			"LSMR" => "linear_systems/lsmr.md",
+			"LSQR" => "linear_systems/lsqr.md",
+			"Stationary methods" => "linear_systems/stationary.md"
+		],
+		"Eigenproblems" => [
+			"Power method" => "eigenproblems/power_method.md",
+		],
+		"SVDL" => "svd/svdl.md",
+		"Randomized algorithms" => "randomized.md",
+		"The iterator approach" => "iterators.md",
+		# "Additional resources" => [
+		# 	# "Public" => "library/public.md",
+		# 	# "Internal" => "library/internal.md",
+		# ],
 		"About" => [
-		"Contributing" => "about/CONTRIBUTING.md",
-		"License" => "about/license.md",
+			"Contributing" => "about/CONTRIBUTING.md",
+			"License" => "about/license.md",
 	    ]
 	]
 )
diff --git a/docs/src/about/CONTRIBUTING.md b/docs/src/about/CONTRIBUTING.md
index 926d41ba..2dfb7ab2 100644
--- a/docs/src/about/CONTRIBUTING.md
+++ b/docs/src/about/CONTRIBUTING.md
@@ -1,7 +1,7 @@
 # Contributing
 
 Contributions are always welcome, as are feature requests and suggestions. Feel
-free to open [issues](https://help.github.com/articles/creating-an-issue/) and [pull requests](https://help.github.com/articles/creating-a-pull-request/) at any time. If you aren't familiar with git or Github please start [now](https://help.github.com/articles/good-resources-for-learning-git-and-github/).
+free to open an [issue](https://github.com/JuliaMath/IterativeSolvers.jl/issues) or [pull request](https://github.com/JuliaMath/IterativeSolvers.jl/pulls) at any time.
 
 It is important to note that almost every method in the package has documentation,
 to know what it does simply use `?<method>` in the terminal.
@@ -79,85 +79,3 @@ A more detailed explanation of all the functions is in both the public and inter
 documentation of `ConvergenceHistory`.
 
 The most rich example of the usage of `ConvergenceHistory` is in `svdl`.
-
-
-## Adding benchmarks
-
-The [Benchmarks](@ref) tab of the documentation is built automatically with Travis.
-Any benchmark added will be displayed automatically after a successful pull request.
-
-The benchmark suite gets built doing a cross product between the available matrices
-and available methods, if there are `n` methods and `m` linear operators then `n*m` will be
-the upper limit of benchmarks to be made. Some methods are not compatible with certain
-matrices, to avoid generating unnecessary benchmarks each method and matrix has
-traits, linear operator traits are inspired from [MatrixDepot.jl](http://matrixdepotjl.readthedocs.io/en/latest/properties.html).
-
-**Method traits**
-
-* accessible : Method accesses the linear operator's fields.
-* inverse    : `A`'s Inverse must exist.
-* symmetric  : `A`'s must be symmetric.
-* pos-def    : `A`'s must be definite.
-
-**Linear Operator traits**
-
-* accessible : Is accessible.
-* inverse    : `A` is exist.
-* symmetric  : `A` is symmetric.
-* pos-def    : `A` is definite.
-* eigen      : Part of the eigensystem of the matrix is explicitly known.
-* graph      : An adjacency matrix of a graph.
-* ill-cond   : The matrix is ill-conditioned for some parameter values.
-* random     : The matrix has random entries.
-* regprob    : The output is a test problem for Regularization Methods.
-* sparse     : The matrix is sparse.
-
-A benchmark between a method and a linear operator will be made if and only if
-the traits of the method is subset of the traits of the linear operator.
-
-Benchmarks are stored in [Benchmarks.jl](https://github.com/JuliaLang/IterativeSolvers.jl/tree/master/benchmark/Benchmarks.jl).
-To add a method use `addEqMethod`.
-
-```julia
-addEqMethod(methods, "jacobi", jacobi, ["inverse","accessible"])
-addEqMethod(methods, "gauss_seidel", gauss_seidel, ["inverse","accessible"])
-addEqMethod(methods, "sor", sor, ["inverse","accessible"])
-addEqMethod(methods, "ssor", ssor, ["inverse","accessible", "symmetric"])
-addEqMethod(methods, "cg", cg, ["inverse", "symmetric", "pos-def"])
-addEqMethod(methods, "gmres", gmres, ["inverse"])
-addEqMethod(methods, "lsqr", lsqr, ["inverse"])
-addEqMethod(methods, "chebyshev", chebyshev, ["inverse", "accessible"])
-```
-
-Here `methods` is a dictionary, the second argument is the name to be displayed in
-the benchmarks, the third argument is the function and the fourth is the traits.
-Every function has a predetermined call in `buildCall` function.
-
-To add an equation use `addEquation`.
-
-```julia
-#Sparse matrix equations
-addEquation(
-    equations, "Poisson", ["Sparse", "Poisson"],
-    ["sparse","inverse", "symmetric", "pos-def", "eigen", "accessible"],
-    :(matrixdepot("poisson",4))
-    )
-
-#Function matrix equations
-addEquation(
-    equations, "SOLtest", ["Function", "SOLtest"],
-    ["function","inverse"],
-    :(buildSol(10)),
-    10
-)
-```
-
-Here `equations` is a dictionary, the second argument is the name to be displayed in
-the benchmarks, the third argument is the path inside the `BenchmarkGroup` type
-the fourth argument is the traits, the fifth is the matrix generator and
-the sixth is the size of the matrix. The size of the matrix has to be passed when
-it is impossible to deduce the dimension from the generator, in this case buildSol
-generates a function and not a matrix.
-
-To add a custom benchmark use directly the `suite` variable which is the `BenchmarkGroup`
-of the package, to know more of this type check [BenchmarkTools.jl](https://github.com/JuliaCI/BenchmarkTools.jl).
diff --git a/docs/src/eigenproblems/power_method.md b/docs/src/eigenproblems/power_method.md
new file mode 100644
index 00000000..443a5fa3
--- /dev/null
+++ b/docs/src/eigenproblems/power_method.md
@@ -0,0 +1,15 @@
+# [(Inverse) power method](@id PowerMethod)
+
+Solves the eigenproblem $Ax = λx$ approximately where $A$ is a general linear map. By default converges towards the dominant eigenpair $(λ, x)$ such that $|λ|$ is largest. Shift-and-invert can be applied to target a specific eigenvalue near `shift` in the complex plane.
+
+## Usage
+
+```@docs
+powm
+powm!
+invpowm
+invpowm!
+```
+
+## Implementation details
+Storage requirements are 3 vectors: the approximate eigenvector `x`, the residual vector `r` and a temporary. The residual norm lags behind one iteration, as it is computed when $Ax$ is performed. Therefore the final resdiual norm is even smaller.
diff --git a/docs/src/getting_started.md b/docs/src/getting_started.md
new file mode 100644
index 00000000..03094157
--- /dev/null
+++ b/docs/src/getting_started.md
@@ -0,0 +1,126 @@
+# Getting started
+
+## Installation
+
+The package can be installed via Julia's package manager.
+
+```julia
+julia> Pkg.add("IterativeSolvers")
+```
+
+## Interface
+
+Virtually all solvers have the common function declarations:
+
+```julia
+solver(A, args...; kwargs...)
+solver!(x, A, args...; kwargs...)
+```
+
+where `A` is a [linear operator](@ref matrixfree) and `x` an initial guess. The second declaration also updates `x` in-place.
+
+### [Explicit matrices and the matrix-free approach](@id matrixfree)
+Rather than constructing an explicit matrix `A` of the type `Matrix` or `SparseMatrixCSC`, it is also possible to pass a general linear operator that performs matrix operations implicitly. This is called the **matrix-free** approach.
+
+For matrix-free types of `A` the following interface is expected to be defined:
+
+- `A*v` computes the matrix-vector product on a `v::AbstractVector`;
+- `A_mul_B!(y, A, v)` computes the matrix-vector product on a `v::AbstractVector` in-place;
+- `eltype(A)` returns the element type implicit in the equivalent matrix representation of `A`;
+- `size(A, d)` returns the nominal dimensions along the `d`th axis in the equivalent matrix representation of `A`.
+
+!!! tip "Matrix-free with LinearMaps.jl"
+    We strongly recommend [LinearMaps.jl](https://github.com/Jutho/LinearMaps.jl) for matrix-free linear operators, as it implements the above methods already for you; you just have to write the action of the linear map.
+
+
+### Additional arguments
+
+Keyword names will vary depending on the method, however some of them will always have the same spelling:
+
+- `tol`: (relative) stopping tolerance of the method;
+- `verbose`: print information during the iterations;
+- `maxiter`: maximum number of allowed iterations;
+- `Pl` and `Pr`: left and right preconditioner. See [Preconditioning](@ref Preconditioning);
+- `log::Bool = false`: output an extra element of type `ConvergenceHistory` containing the convergence history.
+
+### `log` keyword
+
+Most solvers contain the `log` keyword. This is to be used when obtaining
+more information is required, to use it place the set `log` to `true`.
+
+```julia
+x, ch = cg(Master, rand(10, 10), rand(10) log=true)
+svd, L, ch = svdl(Master, rand(100, 100), log=true)
+```
+
+The function will now return one more parameter of type `ConvergenceHistory`.
+
+## ConvergenceHistory
+
+A [`ConvergenceHistory`](@ref) instance stores information of a solver.
+
+Number of iterations.
+
+```julia
+ch.iters
+```
+
+Convergence status.
+
+```julia
+ch.isconverged
+```
+
+Stopping tolerances. (A `Symbol` key is needed to access)
+
+```julia
+ch[:tol]
+```
+
+Maximum number of iterations per restart. (Only on restarted methods)
+
+```julia
+nrests(ch)
+```
+
+Number of matrix-vectors and matrix-transposed-vector products.
+
+```julia
+nprods(ch)
+```
+
+Data stored on each iteration, accessed information can be either a vector
+or matrix. This data can be a lot of things, most commonly residual.
+(A `Symbol` key is needed to access)
+
+```julia
+ch[:resnorm] #Vector or Matrix
+ch[:resnorm, x] #Vector or Matrix element
+ch[:resnorm, x, y] #Matrix element
+```
+
+```@docs
+ConvergenceHistory
+```
+
+### Plotting
+
+`ConvergeHistory` provides a recipe to use with the package [Plots.jl](https://github.com/tbreloff/Plots.jl), this makes it really easy to
+plot on different plot backends. There are two recipes provided:
+
+One for the whole `ConvergenceHistory`.
+
+```julia
+plot(ch)
+```
+
+The other one to plot data binded to a key.
+
+```julia
+_, ch = gmres(rand(10,10), rand(10), maxiter = 100, log=true)
+plot(ch, :resnorm, sep = :blue)
+```
+
+*Plot additional keywords*
+
+`sep::Symbol = :white`: color of the line separator in restarted methods.
diff --git a/docs/src/index.md b/docs/src/index.md
index c853356e..0b92897e 100644
--- a/docs/src/index.md
+++ b/docs/src/index.md
@@ -1,91 +1,37 @@
 # IterativeSolvers.jl
 
-IterativeSolvers.jl is a Julia package that provides iterative algorithms for
-solving linear systems, eigensystems, and singular value problems. The purpose
-of this package is to provide efficient Julia implementations for iterative
-methods. The package aims to accept a wide variety of input types and that's
-why most arguments don't specify a specific type, however this is still in
-progress.
+IterativeSolvers.jl is a Julia package that provides efficient iterative algorithms for solving large linear systems, eigenproblems, and singular value problems. Most of the methods can be used *matrix-free*.
 
-For bug reports, feature requests and questions please submit an issue.
-If you're interested in contributing, please see the [Contributing](@ref) guide.
+For bug reports, feature requests and questions please submit an issue. If you're interested in contributing, please see the [Contributing](@ref) guide.
 
 For more information on future methods have a look at the package [roadmap](https://github.com/JuliaLang/IterativeSolvers.jl/issues/1) on deterministic methods, for randomized algorithms check [here](https://github.com/JuliaLang/IterativeSolvers.jl/issues/33).
 
-## Linear Solvers
+## What method should I use for linear systems?
 
-**Stationary methods**
+When solving linear systems $Ax = b$ for a square matrix $A$ there are quite some options. The typical choices are listed below:
 
-* Jacobi
-* Gauss-Seidel
-* Successive over relaxation
-* Symmetric successive over relaxation
+| Method              | When to use it                                                           |
+|---------------------|--------------------------------------------------------------------------|
+| [Conjugate Gradients](@ref CG) | Best choice for **symmetric**, **positive-definite** matrices |
+| [MINRES](@ref MINRES) | For **symmetric**, **indefinite** matrices |
+| [GMRES](@ref GMRES) | For **nonsymmetric** matrices when a good [preconditioner](@ref Preconditioning) is available |
+| [IDR(s)](@ref IDRs) | For **nonsymmetric**, **strongly indefinite** problems without a good preconditioner |
+| [BiCGStab(l)](@ref BiCGStabl) | Otherwise for **nonsymmetric** problems |
 
-**Non stationary methods**
+We also offer [Chebyshev iteration](@ref Chebyshev) as an alternative to Conjugate Gradients when bounds on the spectrum are known.
 
-* IDRS
-* LSMR
-* LSQR
-* Conjugate gradients (CG)
-* Chebyshev iteration
-* Generalized minimal residual method (with restarts) (GMRES)
+Stationary methods like [Jacobi](@ref), [Gauss-Seidel](@ref), [SOR](@ref) and [SSOR](@ref) can be used as smoothers to reduce high-frequency components in the error in just a few iterations.
 
-## Eigen Solvers
+When solving **least-squares** problems we currently offer just [LSMR](@ref LSMR) and [LSQR](@ref LSQR).
 
-*Simple eigenpair iterations*
+## Eigenproblems and SVD
 
-* Power iteration
-* Inverse power iteration
+For the Singular Value Decomposition we offer [SVDL](@ref SVDL), which is the Golub-Kahan-Lanczos procedure.
 
-**Hermitian**
+For eigenvalue problems we have at this point just the [Power Method](@ref PowerMethod) and some convenience wrappers to do shift-and-invert.
 
-*Lanczos*
+## Randomized algorithms
 
-* Simple Lanczos
+[Randomized algorithms](@ref Randomized) have gotten some traction lately. Some of the methods mentioned in [^Halko2011] have been implemented as well, although their quality is generally poor. Also note that many classical methods such as the subspace iteration, BiCG and recent methods like IDR(s) are also "randomized" in some sense.
 
-## Singular Value Decomposition
-
-* Golub-Kahan-Lanczos
-
-## Randomized
-
-* Condition number estimate
-* Extremal eigenvalue estimates
-* Norm estimate
-* Randomized singular value decomposition
-
-
-
-## Documentation Outline
-
-### Manual
-
-```@contents
-Pages = [
-    "user_manual.md",
-]
-Depth = 2
-```
-
-### Library
-
-```@contents
-Pages = ["library/public.md", "library/internal.md"]
-Depth = 2
-```
-
-### [Index](@id main-index)
-
-### Functions
-
-```@index
-Pages = ["library/public.md", "library/internals.md"]
-Order = [:function]
-```
-
-### Types
-
-```@index
-Pages = ["library/public.md", "library/internals.md"]
-Order = [:type]
-```
+[^Halko2011]: Halko, Nathan, Per-Gunnar Martinsson, and Joel A. Tropp. "Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions." SIAM review 53.2 (2011): 217-288.
\ No newline at end of file
diff --git a/docs/src/iterators.md b/docs/src/iterators.md
new file mode 100644
index 00000000..ef148b0c
--- /dev/null
+++ b/docs/src/iterators.md
@@ -0,0 +1,75 @@
+# [Iterative solvers as iterators](@id Iterators)
+
+In advanced use cases you might want to access the internal data structures of the solver, inject code to be run after each iteration, have total control over allocations or reduce overhead in initialization. The iterator approach of IterativeSolvers.jl makes this possible.
+
+!!! note
+    At this point [BiCGStab(l)](@ref BiCGStabl), [CG](@ref CG), [Chebyshev](@ref Chebyshev), [GMRES](@ref GMRES), [MINRES](@ref MINRES) and the [stationary methods](@ref Stationary) are implemented as iterators. However, the package does not yet export the iterators and helper methods themselves.
+
+## How iterators are implemented
+The solvers listed above are basically a thin wrapper around an iterator. Among other things, they initialize the iterable, loop through the iterator and return the result:
+
+```julia
+function my_solver!(x, A, b)
+    iterable = MySolverIterable(x, A, b)
+    for item in iterable end
+    return iterable.x
+end
+```
+
+Rather than calling `my_solver!(x, A, b)`, you could also initialize the iterable yourself and perform the `for` loop.
+
+## Example: avoiding unnecessary initialization
+The Jacobi method for `SparseMatrixCSC` has some overhead in intialization; not only do we need to allocate a temporary vector, we also have to search for indices of the diagonal (and check their values are nonzero). The current implementation initializes the iterable as:
+
+```julia
+jacobi_iterable(x, A::SparseMatrixCSC, b; maxiter::Int = 10) =
+    JacobiIterable{eltype(x), typeof(x)}(OffDiagonal(A, DiagonalIndices(A)), x, similar(x), b, maxiter)
+```
+
+Now if you apply Jacobi iteration multiple times with the same matrix for just a few iterations, it makes sense to initialize the iterable only once and reuse it afterwards:
+
+```julia
+A = sprand(10_000, 10_000, 10 / 10_000) + 20I
+b1 = rand(10_000)
+b2 = rand(10_000)
+x = rand(10_000)
+
+my_iterable = IterativeSolvers.jacobi_iterable(x, A, b1, maxiter = 4)
+
+for item in my_iterable 
+    println("Iteration for rhs 1")
+end
+
+@show norm(b1 - A * x) / norm(b1)
+
+# Copy the next right-hand side into the iterable
+copy!(my_iterable.b, b2)
+
+for item in my_iterable
+    println("Iteration for rhs 2")
+end
+
+@show norm(b2 - A * x) / norm(b2)
+```
+
+This would output:
+
+```
+Iteration for rhs 1
+Iteration for rhs 1
+Iteration for rhs 1
+Iteration for rhs 1
+norm(b1 - A * x) / norm(b1) = 0.08388528015119746
+Iteration for rhs 2
+Iteration for rhs 2
+Iteration for rhs 2
+Iteration for rhs 2
+norm(b2 - A * x) / norm(b2) = 0.0003681972775644809
+```
+
+## Other use cases
+Other use cases include: 
+- computing the (harmonic) Ritz values from the Hessenberg matrix in GMRES;
+- comparing the approximate residual of methods such as GMRES and BiCGStab(l) with the true residual during the iterations;
+- updating a preconditioner in flexible methods.
+
diff --git a/docs/src/library/internal.md b/docs/src/library/internal.md
index 5557e031..4d116fee 100644
--- a/docs/src/library/internal.md
+++ b/docs/src/library/internal.md
@@ -18,7 +18,6 @@ Pages = ["internal.md"]
 **`Typealiases`**
 
 ```@docs
-IterativeSolvers.PlainHistory
 IterativeSolvers.RestartedHistory
 ```
 
@@ -28,8 +27,6 @@ IterativeSolvers.RestartedHistory
 IterativeSolvers.nextiter!
 IterativeSolvers.reserve!
 IterativeSolvers.shrink!
-IterativeSolvers.setmvps
-IterativeSolvers.setmtvps
 IterativeSolvers.setconv
 IterativeSolvers.showplot
 ```
@@ -48,6 +45,5 @@ IterativeSolvers.Adivtype
 IterativeSolvers.Amultype
 IterativeSolvers.randx
 IterativeSolvers.zerox
-IterativeSolvers.update!
 IterativeSolvers.initrand!(::Vector)
 ```
diff --git a/docs/src/library/public.md b/docs/src/library/public.md
deleted file mode 100644
index ea315043..00000000
--- a/docs/src/library/public.md
+++ /dev/null
@@ -1,246 +0,0 @@
-# Public Documentation
-
-Documentation for `IterativeSolvers.jl`'s public interface.
-
-```@contents
-Pages = ["public.md"]
-Depth = 4
-```
-
-## Index
-
-```@index
-Pages = ["public.md"]
-```
-
-## Types
-
-### `ConvergenceHistory`
-
-```@docs
-ConvergenceHistory
-```
-
-## Linear Solvers
-
-### `Jacobi`
-
-```@docs
-jacobi
-jacobi!
-```
-
-### `Gauss-Seidel`
-
-```@docs
-gauss_seidel
-gauss_seidel!
-```
-
-### `Successive over relaxation`
-
-```@docs
-sor
-sor!
-```
-
-### `Symmetric successive over relaxation`
-
-```@docs
-ssor
-ssor!
-```
-
-### `IDRS`
-
-```@docs
-idrs
-idrs!
-```
-**References**
-
-```
-[1] IDR(s): a family of simple and fast algorithms for solving large
-    nonsymmetric linear systems. P. Sonneveld and M. B. van Gijzen
-    SIAM J. Sci. Comput. Vol. 31, No. 2, pp. 1035--1062, 2008
-[2] Algorithm 913: An Elegant IDR(s) Variant that Efficiently Exploits
-    Bi-orthogonality Properties. M. B. van Gijzen and P. Sonneveld
-    ACM Trans. Math. Software,, Vol. 38, No. 1, pp. 5:1-5:19, 2011
-[3] This file is a translation of the following MATLAB implementation:
-    http://ta.twi.tudelft.nl/nw/users/gijzen/idrs.m
-[4] IDR(s)' webpage http://ta.twi.tudelft.nl/nw/users/gijzen/IDR.html
-```
-
-### `LSMR`
-
-```@docs
-lsmr
-lsmr!
-```
-
-**References**
-
-Adapted from: [http://web.stanford.edu/group/SOL/software/lsmr/](http://web.stanford.edu/group/SOL/software/lsmr/)
-
-### `LSQR`
-
-```@docs
-lsqr
-lsqr!
-```
-
-**References**
-
-```
-Adapted from: http://web.stanford.edu/group/SOL/software/lsqr/
-
-1. C. C. Paige and M. A. Saunders (1982a).
-    LSQR: An algorithm for sparse linear equations and sparse least squares,
-    ACM TOMS 8(1), 43-71.
-
-2. C. C. Paige and M. A. Saunders (1982b).
-    Algorithm 583.  LSQR: Sparse linear equations and least squares problems,
-    ACM TOMS 8(2), 195-209.
-
-3. M. A. Saunders (1995).  Solution of sparse rectangular systems using
-    LSQR and CRAIG, BIT 35, 588-604.
-```
-
-
-### `Conjugate gradients`
-
-```@docs
-cg
-cg!
-```
-
-### `Chebyshev iteration`
-
-```@docs
-chebyshev
-chebyshev!
-```
-
-### `Generalized minimal residual method (with restarts)`
-
-```@docs
-gmres
-gmres!
-```
-
-## Eigen Solvers
-
-### `Power iteration`
-
-```@docs
-powm
-```
-
-### `Inverse power iteration`
-
-```@docs
-invpowm
-```
-
-### `Simple Lanczos`
-
-```@docs
-eiglancz
-```
-
-### `Golub-Kahan-Lanczos`
-
-```@docs
-svdl
-```
-
-**Implementation notes**
-
-The implementation of thick restarting follows closely that of SLEPc as
-described in [Hernandez2008]. Thick restarting can be turned off by setting `k
-= maxiter`, but most of the time this is not desirable.
-
-The singular vectors are computed directly by forming the Ritz vectors from the
-product of the Lanczos vectors `L.P`/`L.Q` and the singular vectors of `L.B`.
-Additional accuracy in the singular triples can be obtained using inverse
-iteration.
-
-**References**
-
-```bibtex
-@article{Golub1965,
-    author = {Golub, G. and Kahan, W.},
-    doi = {10.1137/0702016},
-    journal = {Journal of the Society for Industrial and Applied Mathematics
-        Series B Numerical Analysis},
-    volume = 2,
-    number = 2,
-    pages = {205--224},
-    title = {Calculating the Singular Values and Pseudo-Inverse of a Matrix},
-    year = 1965
-}
-
-@article{Wu2000,
-    author = {Wu, Kesheng and Simon, Horst},
-    journal = {SIAM Journal on Matrix Analysis and Applications},
-    number = 2,
-    pages = {602--616},
-    title = {Thick-Restart {L}anczos Method for Large Symmetric Eigenvalue Problems},
-    volume = 22,
-    year = 2000
-}
-
-@article{Baglama2005,
-    author = {Baglama, James and Reichel, Lothar},
-    doi = {10.1137/04060593X},
-    journal = {SIAM Journal on Scientific Computing},
-    number = 1,
-    pages = {19--42},
-    title = {Augmented Implicitly Restarted {L}anczos Bidiagonalization Methods},
-    volume = 27,
-    year = 2005
-}
-
-@article{Hernandez2008,
-    author = {Hern\'{a}ndez, Vicente and Rom\'{a}n, Jos\'{e} E and Tom\'{a}s,
-    Andr\'{e}s},
-    journal = {Electronic Transactions on Numerical Analysis},
-    pages = {68--85},
-    title = {A Robust and Efficient Parallel {SVD} Solver based on Restarted
-        {L}anczos Bidiagonalization},
-    url = {http://etna.mcs.kent.edu/volumes/2001-2010/vol31/abstract.php?vol=31\&pages=68-85},
-    volume = 31,
-    year = 2008
-}
-```
-
-
-## Randomized
-
-### `Condition number estimate`
-
-```@docs
-rcond
-```
-
-### `Extremal eigenvalue estimates`
-
-```@docs
-reigmin
-reigmax
-```
-
-### `Norm estimate`
-
-```@docs
-rnorm
-rnorms
-```
-
-### `Randomized singular value decomposition`
-
-```@docs
-reig
-rsvdfact
-rsvd_fnkz
-```
diff --git a/docs/src/linear_systems/bicgstabl.md b/docs/src/linear_systems/bicgstabl.md
new file mode 100644
index 00000000..518a5317
--- /dev/null
+++ b/docs/src/linear_systems/bicgstabl.md
@@ -0,0 +1,25 @@
+# [BiCGStab(l)](@id BiCGStabl)
+
+BiCGStab(l) solves the problem $Ax = b$ approximately for $x$ where $A$ is a general, linear operator and $b$ the right-hand side vector. The methods combines BiCG with $l$ GMRES iterations, resulting in a short-reccurence iteration. As a result the memory is fixed as well as the computational costs per iteration.
+
+## Usage
+
+```@docs
+bicgstabl
+bicgstabl!
+```
+
+## Implementation details
+
+The method is based on the original article [^Sleijpen1993], but does not implement later improvements. The normal equations arising from the GMRES steps are solved without orthogonalization. Hence the method should only be reliable for relatively small values of $l$.
+
+The `r` and `u` factors are pre-allocated as matrices of size $n \times (l + 1)$, so that BLAS2 methods can be used. Also the random shadow residual is pre-allocated as a vector. Hence the storage costs are approximately $2l + 3$ vectors.
+
+!!! tip
+    BiCGStabl(l) can be used as an [iterator](@ref Iterators).
+
+[^Sleijpen1993]: 
+
+    Sleijpen, Gerard LG, and Diederik R. Fokkema. "BiCGstab(l) for 
+    linear equations involving unsymmetric matrices with complex spectrum." 
+    Electronic Transactions on Numerical Analysis 1.11 (1993): 2000.
\ No newline at end of file
diff --git a/docs/src/linear_systems/cg.md b/docs/src/linear_systems/cg.md
new file mode 100644
index 00000000..2a4db88e
--- /dev/null
+++ b/docs/src/linear_systems/cg.md
@@ -0,0 +1,17 @@
+# [Conjugate Gradients (CG)](@id CG)
+
+Conjugate Gradients solves $Ax = b$ approximately for $x$ where $A$ is a symmetric, positive-definite linear operator and $b$ the right-hand side vector. The method uses short recurrences and therefore has fixed memory costs and fixed computational costs per iteration.
+
+## Usage
+
+```@docs
+cg
+cg!
+```
+
+## Implementation details
+
+The current implementation follows a rather standard approach. Note that preconditioned CG (or PCG) is slightly different from ordinary CG, because the former must compute the residual explicitly, while it is available as byproduct in the latter. Our implementation of CG ensures the minimal number of vector operations.
+
+!!! tip
+    CG can be used as an [iterator](@ref Iterators).
\ No newline at end of file
diff --git a/docs/src/linear_systems/chebyshev.md b/docs/src/linear_systems/chebyshev.md
new file mode 100644
index 00000000..4b55c918
--- /dev/null
+++ b/docs/src/linear_systems/chebyshev.md
@@ -0,0 +1,20 @@
+# [Chebyshev iteration](@id Chebyshev)
+
+Chebyshev iteration solves the problem $Ax=b$ approximately for $x$ where $A$ is a symmetric, definite linear operator and $b$ the right-hand side vector. The methods assumes the interval $[\lambda_{min}, \lambda_{max}]$ containing all eigenvalues of $A$ is known, so that $x$ can be iteratively constructed via a Chebyshev polynomial with zeros in this interval. This polynomial ultimately acts as a filter that removes components in the direction of the eigenvectors from the initial residual.
+
+The main advantage with respect to Conjugate Gradients is that BLAS1 operations such as inner products are avoided.
+
+## Usage
+
+```@docs
+chebyshev
+chebyshev!
+```
+
+## Implementation details
+
+!!! warning "BLAS1 operations"
+    Although the method is often used to avoid computation of inner products, the stopping criterion is still based on the residual norm. Hence the current implementation is not free of BLAS1 operations.
+
+!!! tip
+    Chebyshev iteration can be used as an [iterator](@ref Iterators).
\ No newline at end of file
diff --git a/docs/src/linear_systems/gmres.md b/docs/src/linear_systems/gmres.md
new file mode 100644
index 00000000..5f5574d2
--- /dev/null
+++ b/docs/src/linear_systems/gmres.md
@@ -0,0 +1,21 @@
+# [Restarted GMRES](@id GMRES)
+
+GMRES solves the problem $Ax = b$ approximately for $x$ where $A$ is a general, linear operator and $b$ the right-hand side vector. The method is optimal in the sense that it selects the solution with minimal residual from a Krylov subspace, but the price of optimality is increasing storage and computation effort per iteration. Restarts are necessary to fix these costs.
+
+## Usage
+
+```@docs
+gmres
+gmres!
+```
+
+## Implementation details
+
+The implementation pre-allocates a matrix $V$ of size `n` by `restart` whose columns form an orthonormal basis for the Krylov subspace. This allows BLAS2 operations when updating the solution vector $x$. The Hessenberg matrix is also pre-allocated.
+
+Modified Gram-Schmidt is used to orthogonalize the columns of $V$.
+
+The computation of the residual norm is implemented in a non-standard way, namely keeping track of a vector $\gamma$ in the null-space of $H_k^*$, which is the adjoint of the $(k + 1) \times k$ Hessenberg matrix $H_k$ at the $k$th iteration. Only when $x$ needs to be updated is the Hessenberg matrix mutated with Givens rotations.
+
+!!! tip
+    GMRES can be used as an [iterator](@ref Iterators). This makes it possible to access the Hessenberg matrix and Krylov basis vectors during the iterations.
\ No newline at end of file
diff --git a/docs/src/linear_systems/idrs.md b/docs/src/linear_systems/idrs.md
new file mode 100644
index 00000000..f530563d
--- /dev/null
+++ b/docs/src/linear_systems/idrs.md
@@ -0,0 +1,17 @@
+# [IDR(s)](@id IDRs)
+
+The Induced Dimension Reduction method is a family of simple and fast Krylov subspace algorithms for solving large nonsymmetric linear systems. The idea behind the IDR(s) variant is to generate residuals that are in the nested subspaces of shrinking dimensions.
+
+## Usage
+
+```@docs
+idrs
+idrs!
+```
+
+## Implementation details
+
+The current implementation is based on the [MATLAB version](http://ta.twi.tudelft.nl/nw/users/gijzen/idrs.m) by Van Gijzen and Sonneveld. For background see [^Sonneveld2008], [^VanGijzen2011] and [the IDR(s) webpage](http://ta.twi.tudelft.nl/nw/users/gijzen/IDR.html).
+
+[^Sonneveld2008]: IDR(s): a family of simple and fast algorithms for solving large nonsymmetric linear systems. P. Sonneveld and M. B. van Gijzen SIAM J. Sci. Comput. Vol. 31, No. 2, pp. 1035--1062, 2008
+[^VanGijzen2011]: Algorithm 913: An Elegant IDR(s) Variant that Efficiently Exploits Bi-orthogonality Properties. M. B. van Gijzen and P. Sonneveld ACM Trans. Math. Software,, Vol. 38, No. 1, pp. 5:1-5:19, 2011
\ No newline at end of file
diff --git a/docs/src/linear_systems/lsmr.md b/docs/src/linear_systems/lsmr.md
new file mode 100644
index 00000000..2e368715
--- /dev/null
+++ b/docs/src/linear_systems/lsmr.md
@@ -0,0 +1,14 @@
+# LSMR
+
+Least-squares minimal residual
+
+## Usage
+
+```@docs
+lsmr
+lsmr!
+```
+
+## Implementation details
+
+Adapted from: [http://web.stanford.edu/group/SOL/software/lsmr/](http://web.stanford.edu/group/SOL/software/lsmr/)
diff --git a/docs/src/linear_systems/lsqr.md b/docs/src/linear_systems/lsqr.md
new file mode 100644
index 00000000..ed2d9cab
--- /dev/null
+++ b/docs/src/linear_systems/lsqr.md
@@ -0,0 +1,12 @@
+# LSQR
+
+## Usage
+
+```@docs
+lsqr
+lsqr!
+```
+
+## Implementation details
+
+Adapted from: [http://web.stanford.edu/group/SOL/software/lsqr/](http://web.stanford.edu/group/SOL/software/lsqr/).
diff --git a/docs/src/linear_systems/minres.md b/docs/src/linear_systems/minres.md
new file mode 100644
index 00000000..83a83e01
--- /dev/null
+++ b/docs/src/linear_systems/minres.md
@@ -0,0 +1,27 @@
+# [MINRES](@id MINRES)
+
+MINRES is a short-recurrence version of GMRES for solving $Ax = b$ approximately for $x$ where $A$ is a symmetric, Hermitian, skew-symmetric or skew-Hermitian linear operator and $b$ the right-hand side vector.
+
+## Usage
+
+```@docs
+minres
+minres!
+```
+
+## Implementation details
+MINRES exploits the tridiagonal structure of the Hessenberg matrix. Although MINRES is mathematically equivalent to GMRES, it might not be equivalent in finite precision. MINRES updates the solution as
+
+$$x := x_0 + (V R^{-1}) (Q^*\|r_0\|e_1)$$
+
+where $V$ is the orthonormal basis for the Krylov subspace and $QR$ is the QR-decomposition of the Hessenberg matrix. Note that the brackets are placed slightly differently from how GMRES would update the residual.
+
+MINRES computes $V$ and $W = VR^{-1}$ via a three-term recurrence, using only the last column of $R.$ Therefore we pre-allocate only six vectors, save only the last two entries of $Q^*\|r_0\|e_1$ and part of the last column of the Hessenberg matrix.
+
+!!! note "Real and complex arithmetic"
+    If $A$ is Hermitian, then the Hessenberg matrix will be real. This is exploited in the current implementation.
+
+    If $A$ is skew-Hermitian, the diagonal of the Hessenberg matrix will be imaginary, and hence we use complex arithmetic in that case.
+
+!!! tip
+    MINRES can be used as an [iterator](@ref Iterators).
\ No newline at end of file
diff --git a/docs/src/linear_systems/stationary.md b/docs/src/linear_systems/stationary.md
new file mode 100644
index 00000000..c922b75d
--- /dev/null
+++ b/docs/src/linear_systems/stationary.md
@@ -0,0 +1,39 @@
+# [Stationary methods](@id Stationary)
+
+Stationary methods are typically used as smoothers in multigrid methods, where only very few iterations are applied to get rid of high-frequency components in the error. The implementations of stationary methods have this goal in mind, which means there is no other stopping criterion besides the maximum number of iterations.
+
+!!! note "CSC versus CSR"
+    Julia stores matrices column-major. In order to avoid cache misses, the implementations of our stationary methods traverse the matrices column-major. This deviates from classical textbook implementations. Also the SOR and SSOR methods cannot be computed efficiently in-place, but require a temporary vector.
+
+    When it comes to `SparseMatrixCSC`, we precompute in all stationary methods an integer array of the indices of the diagonal to avoid expensive searches in each iteration.
+
+## Jacobi
+
+```@docs
+jacobi
+jacobi!
+```
+
+## Gauss-Seidel
+
+```@docs
+gauss_seidel
+gauss_seidel!
+```
+
+## [Successive over-relaxation (SOR)](@id SOR)
+
+```@docs
+sor
+sor!
+```
+
+## [Symmetric successive over-relaxation (SSOR)](@id SSOR)
+
+```@docs
+ssor
+ssor!
+```
+
+!!! tip
+    All stationary methods can be used a [iterators](@ref Iterators).
\ No newline at end of file
diff --git a/docs/src/preconditioning.md b/docs/src/preconditioning.md
new file mode 100644
index 00000000..7d6d28c7
--- /dev/null
+++ b/docs/src/preconditioning.md
@@ -0,0 +1,21 @@
+# [Preconditioning](@id Preconditioning)
+
+Many iterative solvers have the option to provide left and right preconditioners (`Pl` and `Pr` resp.) in order to speed up convergence or prevent stagnation. They transform a problem $Ax = b$ into a better conditioned system $(P_l^{-1}AP_r^{-1})y = P_l^{-1}b$, where $x = P_r^{-1}y$.
+
+These preconditioners should support the operations 
+
+- `A_ldiv_B!(y, P, x)` computes `P \ x` in-place of `y`;
+- `A_ldiv_B!(P, x)` computes `P \ x` in-place of `x`;
+- and `P \ x`.
+
+If no preconditioners are passed to the solver, the method will default to 
+
+```julia
+Pl = Pr = IterativeSolvers.Identity()
+```
+
+## Available preconditioners
+IterativeSolvers.jl itself does not provide any other preconditioners besides `Identity()`, but recommends the following external packages:
+
+- [ILU.jl](https://github.com/haampie/ILU.jl) for incomplete LU decompositions (using drop tolerance);
+- [IncompleteSelectedInversion.jl](https://github.com/ettersi/IncompleteSelectedInversion.jl) for incomplete LDLt decompositions.
\ No newline at end of file
diff --git a/docs/src/randomized.md b/docs/src/randomized.md
new file mode 100644
index 00000000..707b84a1
--- /dev/null
+++ b/docs/src/randomized.md
@@ -0,0 +1,35 @@
+# [Randomized algorithms](@id Randomized)
+
+The methods below are based on [^Halko2011].
+
+```@docs
+reig
+rsvdfact
+rsvd_fnkz
+```
+
+## Condition number estimate
+
+```@docs
+rcond
+```
+
+## Extremal eigenvalue estimates
+
+```@docs
+reigmin
+reigmax
+```
+
+## Norm estimate
+
+```@docs
+rnorm
+rnorms
+```
+
+[^Halko2011]: Halko, Nathan, Per-Gunnar Martinsson, and Joel A. Tropp. "Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions." SIAM review 53.2 (2011): 217-288.
+
+[^Dixon1983]: Dixon, John D. "Estimating extremal eigenvalues and condition numbers of matrices." SIAM Journal on Numerical Analysis 20.4 (1983): 812-814.
+
+[^Liberty2007]: Liberty, Edo, et al. "Randomized algorithms for the low-rank approximation of matrices." Proceedings of the National Academy of Sciences 104.51 (2007): 20167-20172.
\ No newline at end of file
diff --git a/docs/src/svd/svdl.md b/docs/src/svd/svdl.md
new file mode 100644
index 00000000..bb3915eb
--- /dev/null
+++ b/docs/src/svd/svdl.md
@@ -0,0 +1,17 @@
+# [Golub-Kahan-Lanczos (SVDL)](@id SVDL)
+
+The SVDL method computes a partial, approximate SVD decomposition of a general linear operator $A$.
+
+## Usage
+
+```@docs
+svdl
+```
+
+## Implementation details
+
+The implementation of thick restarting follows closely that of SLEPc as described in [^Hernandez2008]. Thick restarting can be turned off by setting `k = maxiter`, but most of the time this is not desirable.
+
+The singular vectors are computed directly by forming the Ritz vectors from the product of the Lanczos vectors `L.P`/`L.Q` and the singular vectors of `L.B`. Additional accuracy in the singular triples can be obtained using inverse iteration.
+
+[^Hernandez2008]: Vicente Hernández, José E. Román, and Andrés Tomás. "A robust and efficient parallel SVD solver based on restarted Lanczos bidiagonalization." Electronic Transactions on Numerical Analysis 31 (2008): 68-85.
diff --git a/docs/src/user_manual.md b/docs/src/user_manual.md
deleted file mode 100644
index a3890d55..00000000
--- a/docs/src/user_manual.md
+++ /dev/null
@@ -1,162 +0,0 @@
-# Manual
-
-## Installation
-
-The package can be installed with a simple instruction.
-
-```julia
-julia> Pkg.add("IterativeSolvers")
-```
-
-After installing the package, if you wish to use the latest features of the
-package you must switch to the master branch with `Pkg.clone`.
-
-```julia
-julia> Pkg.checkout("IterativeSolvers")
-```
-
-## Interface
-
-All linear-algebraic routines will take as input a linear operator `A` that maps
-vectors to vectors. Typically `A` is a `Matrix` or a `SparseMatrixCSC`, but since
-`A` is not explicitly typed, any linear operator that supports matrix operations
-can be used as well. This makes it possible to apply solvers *matrix-free*. In 
-IterativeSolvers.jl we strongly recommend [LinearMaps.jl](https://github.com/Jutho/LinearMaps.jl) 
-for non-matrix types of `A`.
-
-For matrix-free types of `A` the following interface is expected to be defined:
-
-`A*v` computes the matrix-vector product on a `v::AbstractVector`.
-
-`A_mul_B!(y, A, v)` computes the matrix-vector product on a `v::AbstractVector` in-place.
-
-`eltype(A)` returns the element type implicit in the equivalent matrix representation of `A`.
-
-`size(A, d)` is defined and returns the nominal dimensions along the `d`th axis
-in the equivalent matrix representation of `A`.
-
-### Solvers
-
-All linear solvers have a common function declaration (with a few exceptions).
-
-```
-solver(A, b::Vector; kwargs...)
-solver!(x, A, b::Vector; kwargs...)
-```
-
-In the case of eigenproblems or singular value decompositions:
-
-```
-eigsolver(A; kwargs...)
-eigsolver!(x, A; kwargs...)
-```
-
-`A` is a linear operator as described above.
-
-`b` is the vector to be solved.
-
-`x` is a vector for the initial guess. In the case of a mutating call this
-parameter will be overwritten. (Mutating functions end with `!`)
-
-Output will be the solution to the system.
-
-
-### Additional arguments
-
-Keyword names will vary depending on the method, however some
-of them will always have the same spelling:
-
-`tol`: stopping tolerance of the method. When a method accepts more than one
-tolerance they are enumerated  with a letter prefix, e.g `atol`, `btol`, `ctol`,
-etc.
-
-`verbose`: print information about the running method.
-
-`maxiter`: maximum number of allowed iterations.
-
-`Pl`: left preconditioner. (When applicable)
-
-`Pr`: right preconditioner. (When applicable)
-
-`log::Bool = false`: output an extra element of type `ConvergenceHistory`
-containing extra information of the method execution.
-
-### `log` keyword
-
-All solvers contain the `log` keyword. This is to be used when obtaining
-more information is required, to use it place the set `log` to `true`.
-
-```julia
-x, ch = cg(Master, rand(10, 10), rand(10) log=true)
-svd, L, ch = svdl(Master, rand(100, 100), log=true)
-```
-
-The function will now return one more parameter of type `ConvergenceHistory`.
-
-## ConvergenceHistory
-
-A [`ConvergenceHistory`](@ref) instance stores information of a solver.
-
-Number of iterations.
-
-```julia
-ch.iters
-```
-
-Convergence status.
-
-```julia
-ch.isconverged
-```
-
-Stopping tolerances. (A `Symbol` key is needed to access)
-
-```julia
-ch[:tol]
-```
-
-Maximum number of iterations per restart. (Only on restarted methods)
-
-```julia
-nrests(ch)
-```
-
-Number of matrix-vectors and matrix-transposed-vector products.
-
-```julia
-nprods(ch)
-```
-
-Data stored on each iteration, accessed information can be either a vector
-or matrix. This data can be a lot of things, most commonly residual.
-(A `Symbol` key is needed to access)
-
-```julia
-ch[:resnorm] #Vector or Matrix
-ch[:resnorm, x] #Vector or Matrix element
-ch[:resnorm, x, y] #Matrix element
-```
-
-The available keys of each method is described in the [Public Documentation](@ref).
-
-### Plotting
-
-`ConvergeHistory` provides a recipe to use with the package [Plots.jl](https://github.com/tbreloff/Plots.jl), this makes it really easy to
-plot on different plot backends. There are two recipes provided:
-
-One for the whole `ConvergenceHistory`.
-
-```julia
-plot(ch)
-```
-
-The other one to plot data binded to a key.
-
-```julia
-_, ch = gmres(rand(10,10), rand(10), maxiter = 100, log=true)
-plot(ch, :resnorm, sep = :blue)
-```
-
-*Plot additional keywords*
-
-`sep::Symbol = :white`: color of the line separator in restarted methods.
diff --git a/src/IterativeSolvers.jl b/src/IterativeSolvers.jl
index a09de62e..78c3b51e 100644
--- a/src/IterativeSolvers.jl
+++ b/src/IterativeSolvers.jl
@@ -16,8 +16,8 @@ include("factorization.jl")
 include("hessenberg.jl")
 
 #Linear solvers
-include("stationary_sparse.jl")
 include("stationary.jl")
+include("stationary_sparse.jl")
 include("cg.jl")
 include("minres.jl")
 include("bicgstabl.jl")
diff --git a/src/bicgstabl.jl b/src/bicgstabl.jl
index 985c9d8b..013f5bf4 100644
--- a/src/bicgstabl.jl
+++ b/src/bicgstabl.jl
@@ -144,8 +144,43 @@ end
 
 # Classical API
 
+"""
+    bicgstabl(A, b, l; kwargs...) -> x, [history]
+
+Same as [`bicgstabl!`](@ref), but allocates a solution vector `x` initialized with zeros.
+"""
 bicgstabl(A, b, l = 2; kwargs...) = bicgstabl!(zerox(A, b), A, b, l; initial_zero = true, kwargs...)
 
+"""
+    bicgstabl!(x, A, b, l; kwargs...) -> x, [history]
+
+# Arguments
+
+- `A`: linear operator;
+- `b`: right hand side (vector);
+- `l::Int = 2`: Number of GMRES steps.
+
+## Keywords
+
+- `max_mv_products::Int = min(30, size(A, 1))`: maximum number of matrix vector products.
+For BiCGStab(l) this is a less dubious term than "number of iterations";
+- `Pl = Identity()`: left preconditioner of the method;
+- `tol::Real = sqrt(eps(real(eltype(b))))`: tolerance for stopping condition `|r_k| / |r_0| ≤ tol`. 
+   Note that (1) the true residual norm is never computed during the iterations, 
+   only an approximation; and (2) if a preconditioner is given, the stopping condition is based on the 
+   *preconditioned residual*.
+
+# Return values
+
+**if `log` is `false`**
+
+- `x`: approximate solution.
+
+**if `log` is `true`**
+
+- `x`: approximate solution;
+- `history`: convergence history.
+"""
 function bicgstabl!(x, A, b, l = 2;
     tol = sqrt(eps(real(eltype(b)))),
     max_mv_products::Int = min(20, size(A, 1)),
@@ -182,73 +217,3 @@ function bicgstabl!(x, A, b, l = 2;
     
     log ? (iterable.x, history) : iterable.x
 end
-
-#################
-# Documentation #
-#################
-
-let
-doc_call = "bicgstab(A, b, l)"
-doc!_call = "bicgstab!(x, A, b, l)"
-
-doc_msg = "Solve A*x = b with the BiCGStab(l)"
-doc!_msg = "Overwrite `x`.\n\n" * doc_msg
-
-doc_arg = ""
-doc!_arg = """`x`: initial guess, overwrite final estimation."""
-
-doc_version = (doc_call, doc_msg, doc_arg)
-doc!_version = (doc!_call, doc!_msg, doc!_arg)
-
-docstring = String[]
-
-#Build docs
-for (call, msg, arg) in (doc_version, doc!_version) #Start
-    push!(docstring, 
-"""
-$call
-
-$msg
-
-# Arguments
-
-$arg
-
-`A`: linear operator.
-
-`b`: right hand side (vector).
-
-`l::Int = 2`: Number of GMRES steps.
-
-## Keywords
-
-`Pl = Identity()`: left preconditioner of the method.
-
-`tol::Real = sqrt(eps(real(eltype(b))))`: tolerance for stopping condition 
-`|r_k| / |r_0| ≤ tol`. Note that:
-
-1. The actual residual is never computed during the iterations; only an 
-approximate residual is used.
-2. If a preconditioner is given, the stopping condition is based on the 
-*preconditioned residual*.
-
-`max_mv_products::Int = min(30, size(A, 1))`: maximum number of matrix
-vector products. For BiCGStab this is a less dubious criterion than maximum
-number of iterations.
-
-# Output
-
-`x`: approximated solution.
-`residual`: last approximate residual norm
-
-# References
-[1] Sleijpen, Gerard LG, and Diederik R. Fokkema. "BiCGstab(l) for 
-linear equations involving unsymmetric matrices with complex spectrum." 
-Electronic Transactions on Numerical Analysis 1.11 (1993): 2000.
-"""
-    )
-end
-
-@doc docstring[1] -> bicgstabl
-@doc docstring[2] -> bicgstabl!
-end
\ No newline at end of file
diff --git a/src/cg.jl b/src/cg.jl
index 0508beb5..e8cfdfe3 100644
--- a/src/cg.jl
+++ b/src/cg.jl
@@ -135,18 +135,58 @@ function cg_iterator!(x, A, b, Pl = Identity();
     end
 end
 
+"""
+    cg(A, b; kwargs...) -> x, [history]
+
+Same as [`cg!`](@ref), but allocates a solution vector `x` initialized with zeros.
+"""
 cg(A, b; kwargs...) = cg!(zerox(A, b), A, b; initially_zero = true, kwargs...)
 
+"""
+    cg!(x, A, b; kwargs...) -> x, [history]
+
+# Arguments
+
+- `x`: Initial guess, will be updated in-place;
+- `A`: linear operator;
+- `b`: right-hand side.
+
+## Keywords
+
+- `initially_zero::Bool`: If `true` assumes that `iszero(x)` so that one 
+  matrix-vector product can be saved when computing the initial 
+  residual vector;
+- `Pl = Identity()`: left preconditioner of the method. Should be symmetric, 
+  positive-definite like `A`.
+- `tol::Real = sqrt(eps(real(eltype(b))))`: tolerance for stopping condition `|r_k| / |r_0| ≤ tol`;
+- `maxiter::Integer = size(A,2)`: maximum number of iterations;
+- `verbose::Bool = false`: print method information;
+- `log::Bool = false`: keep track of the residual norm in each iteration;
+
+# Output
+
+**if `log` is `false`**
+
+- `x`: approximated solution.
+
+**if `log` is `true`**
+
+- `x`: approximated solution.
+- `ch`: convergence history.
+
+**ConvergenceHistory keys**
+
+- `:tol` => `::Real`: stopping tolerance.
+- `:resnom` => `::Vector`: residual norm at each iteration.
+"""
 function cg!(x, A, b;
     tol = sqrt(eps(real(eltype(b)))),
     maxiter::Integer = min(20, size(A, 1)),
-    plot = false,
     log::Bool = false,
     verbose::Bool = false,
     Pl = Identity(),
     kwargs...
 )
-    (plot & !log) && error("Can't plot when log keyword is false")
     history = ConvergenceHistory(partial = !log)
     history[:tol] = tol
     log && reserve!(history, :resnorm, maxiter + 1)
@@ -167,87 +207,6 @@ function cg!(x, A, b;
     verbose && println()
     log && setconv(history, converged(iterable))
     log && shrink!(history)
-    plot && showplot(history)
 
     log ? (iterable.x, history) : iterable.x
 end
-
-#################
-# Documentation #
-#################
-
-let
-#Initialize parameters
-doc_call = """    cg(A, b)
-"""
-doc!_call = """    cg!(x, A, b)
-"""
-
-doc_msg = "Solve A*x=b with the conjugate gradients method."
-doc!_msg = "Overwrite `x`.\n\n" * doc_msg
-
-doc_arg = ""
-doc!_arg = """`x`: initial guess, overwrite final estimation."""
-
-doc_version = (doc_call, doc_msg, doc_arg)
-doc!_version = (doc!_call, doc!_msg, doc!_arg)
-
-i = 0
-docstring = Vector(2)
-
-#Build docs
-for (call, msg, arg) in [doc_version, doc!_version] #Start
-i+=1
-docstring[i] = """
-$call
-
-$msg
-
-If `log` is set to `true` is given, method will output a tuple `x, ch`. Where
-`ch` is a `ConvergenceHistory` object. Otherwise it will only return `x`.
-
-# Arguments
-
-$arg
-
-`A`: linear operator.
-
-`b`: right hand side.
-
-## Keywords
-
-`Pl = Identity()`: left preconditioner of the method.
-
-`tol::Real = size(A,2)*eps()`: stopping tolerance.
-
-`maxiter::Integer = size(A,2)`: maximum number of iterations.
-
-`verbose::Bool = false`: print method information.
-
-`log::Bool = false`: output an extra element of type `ConvergenceHistory`
-containing extra information of the method execution.
-
-# Output
-
-**if `log` is `false`**
-
-`x`: approximated solution.
-
-**if `log` is `true`**
-
-`x`: approximated solution.
-
-`ch`: convergence history.
-
-**ConvergenceHistory keys**
-
-`:tol` => `::Real`: stopping tolerance.
-
-`:resnom` => `::Vector`: residual norm at each iteration.
-
-"""
-end
-
-@doc docstring[1] -> cg
-@doc docstring[2] -> cg!
-end
\ No newline at end of file
diff --git a/src/chebyshev.jl b/src/chebyshev.jl
index fc68818b..07425ae0 100644
--- a/src/chebyshev.jl
+++ b/src/chebyshev.jl
@@ -90,13 +90,50 @@ function chebyshev_iterable!(x, A, b, λmin::Real, λmax::Real;
     )
 end
 
-####################
-# API method calls #
-####################
 
+"""
+    chebyshev(A, b, λmin::Real, λmax::Real; kwargs...) -> x, [history]
+
+Same as [`chebyshev!`](@ref), but allocates a solution vector `x` initialized with zeros.
+"""
 chebyshev(A, b, λmin::Real, λmax::Real; kwargs...) =
     chebyshev!(zerox(A, b), A, b, λmin, λmax; initially_zero = true, kwargs...)
 
+"""
+    chebyshev!(x, A, b, λmin::Real, λmax::Real; kwargs...) -> x, [history]
+
+Solve Ax = b for symmetric, definite matrices A using Chebyshev iteration.
+
+# Arguments
+
+- `x`: initial guess, will be updated in-place;
+- `A`: linear operator;
+- `b`: right-hand side;
+- `λmin::Real`: lower bound for the real eigenvalues
+- `λmax::Real`: upper bound for the real eigenvalues
+
+## Keywords
+
+- `initially_zero::Bool = false`: if `true` assumes that `iszero(x)` so that one 
+  matrix-vector product can be saved when computing the initial 
+  residual vector;
+- `tol`: tolerance for stopping condition `|r_k| / |r_0| ≤ tol`.
+- `maxiter::Int`: maximum number of inner iterations of GMRES;
+- `Pl = Identity()`: left preconditioner;
+- `log::Bool = false`: keep track of the residual norm in each iteration;
+- `verbose::Bool = false`: print convergence information during the iterations.
+
+# Return values
+
+**if `log` is `false`**
+
+- `x`: approximate solution.
+
+**if `log` is `true`**
+
+- `x`: approximate solution;
+- `history`: convergence history.
+"""
 function chebyshev!(x, A, b, λmin::Real, λmax::Real;
     Pl = Identity(),
     tol::Real=sqrt(eps(real(eltype(b)))),
@@ -125,83 +162,3 @@ function chebyshev!(x, A, b, λmin::Real, λmax::Real;
 
     log ? (x, history) : x
 end
-
-#################
-# Documentation #
-#################
-
-let
-#Initialize parameters
-doc_call = """    chebyshev!(A, b, λmin, λmax)
-"""
-doc!_call = """    chebyshev!(x, A, b, λmin, λmax)
-"""
-
-doc_msg = "Solve A*x=b using the chebyshev method."
-doc!_msg = "Overwrite `x`.\n\n" * doc_msg
-
-doc_arg = ""
-doc!_arg = """`x`: initial guess, overwrite final estimation."""
-
-doc_version = (doc_call, doc_msg, doc_arg)
-doc!_version = (doc!_call, doc!_msg, doc!_arg)
-
-i=0
-docstring = Vector(2)
-
-#Build docs
-for (call, msg, arg) in [doc_version, doc!_version] #Start
-i+=1
-docstring[i] = """
-$call
-
-$msg
-
-If `log` is set to `true` is given, method will output a tuple `x, ch`. Where
-`ch` is a `ConvergenceHistory` object. Otherwise it will only return `x`.
-
-# Arguments
-
-$arg
-
-`A`: linear operator.
-
-`b`: right hand side.
-
-## Keywords
-
-`Pl = 1`: left preconditioner of the method.
-
-`tol::Real = sqrt(eps())`: stopping tolerance.
-
-`maxiter::Integer = size(A,2)^3`: maximum number of iterations.
-
-`verbose::Bool = false`: print method information.
-
-`log::Bool = false`: output an extra element of type `ConvergenceHistory`
-containing extra information of the method execution.
-
-# Output
-
-**if `log` is `false`**
-
-`x`: approximated solution.
-
-**if `log` is `true`**
-
-`x`: approximated solution.
-
-`ch`: convergence history.
-
-**ConvergenceHistory keys**
-
-`:tol` => `::Real`: stopping tolerance.
-
-`:resnom` => `::Vector`: residual norm at each iteration.
-
-"""
-end
-
-@doc docstring[1] -> chebyshev
-@doc docstring[2] -> chebyshev!
-end
diff --git a/src/gmres.jl b/src/gmres.jl
index 6aee06a0..22c866d2 100644
--- a/src/gmres.jl
+++ b/src/gmres.jl
@@ -128,8 +128,48 @@ function gmres_iterable!(x, A, b;
     )
 end
 
+"""
+    gmres(A, b; kwargs...) -> x, [history]
+
+Same as [`gmres!`](@ref), but allocates a solution vector `x` initialized with zeros.
+"""
 gmres(A, b; kwargs...) = gmres!(zeros(b), A, b; initially_zero = true, kwargs...)
 
+"""
+    gmres!(x, A, b; kwargs...) -> x, [history]
+
+Solves the problem ``Ax = b`` with restarted GMRES.
+
+# Arguments
+
+- `x`: Initial guess, will be updated in-place;
+- `A`: linear operator;
+- `b`: right-hand side.
+
+## Keywords
+
+- `initially_zero::Bool`: If `true` assumes that `iszero(x)` so that one 
+  matrix-vector product can be saved when computing the initial 
+  residual vector;
+- `tol`: relative tolerance;
+- `restart::Int`: restarts GMRES after specified number of iterations;
+- `maxiter::Int`: maximum number of inner iterations of GMRES;
+- `Pl`: left preconditioner;
+- `Pr`: right preconditioner;
+- `log::Bool`: keep track of the residual norm in each iteration;
+- `verbose::Bool`: print convergence information during the iterations.
+
+# Return values
+
+**if `log` is `false`**
+
+- `x`: approximate solution.
+
+**if `log` is `true`**
+
+- `x`: approximate solution;
+- `history`: convergence history.
+"""
 function gmres!(x, A, b;
   Pl = Identity(),
   Pr = Identity(),
@@ -137,7 +177,7 @@ function gmres!(x, A, b;
   restart::Int = min(20, length(b)),
   maxiter::Int = restart,
   log::Bool = false,
-  initially_zero = false,
+  initially_zero::Bool = false,
   verbose::Bool = false
 )
     history = ConvergenceHistory(partial = !log, restart = restart)
diff --git a/src/history.jl b/src/history.jl
index afe2c0b4..eacf6184 100644
--- a/src/history.jl
+++ b/src/history.jl
@@ -28,7 +28,7 @@ Store general and in-depth information about an iterative method.
 `isconverged::Bool`: convergence of the method.
 
 `data::Dict{Symbol,Any}`: Stores all the information stored during the method execution.
-It stores tolerances, residuals and other information, e.g. ritz values in [svdl](@ref).
+It stores tolerances, residuals and other information, e.g. Ritz values in [`svdl`](@ref).
 
 # Constructors
 
diff --git a/src/idrs.jl b/src/idrs.jl
index 79dd4d54..bddada89 100644
--- a/src/idrs.jl
+++ b/src/idrs.jl
@@ -1,14 +1,45 @@
 export idrs, idrs!
 
-####################
-# API method calls #
-####################
-
+"""
+    idrs(A, b; s = 8) -> x, [history]
 
+Same as [`idrs!`](@ref), but allocates a solution vector `x` initialized with zeros.
+"""
 idrs(A, b; kwargs...) = idrs!(zerox(A,b), A, b; kwargs...)
 
+"""
+    idrs!(x, A, b; s = 8) -> x, [history]
+
+Solve the problem ``Ax = b`` approximately with IDR(s), where `s` is the dimension of the
+shadow space.
+
+# Arguments
+
+- `x`: Initial guess, will be updated in-place;
+- `A`: linear operator;
+- `b`: right-hand side.
+
+## Keywords
+
+- `s::Integer = 8`: dimension of the shadow space;
+- `tol`: relative tolerance;
+- `maxiter::Int`: maximum number of iterations;
+- `log::Bool`: keep track of the residual norm in each iteration;
+- `verbose::Bool`: print convergence information during the iterations.
+
+# Return values
+
+**if `log` is `false`**
+
+- `x`: approximate solution.
+
+**if `log` is `true`**
+
+- `x`: approximate solution;
+- `history`: convergence history.
+"""
 function idrs!(x, A, b;
-    s = 8, tol=sqrt(eps(typeof(real(b[1])))), maxiter=length(x)^2,
+    s = 8, tol=sqrt(eps(real(eltype(b)))), maxiter=length(x)^2,
     log::Bool=false, kwargs...
     )
     history = ConvergenceHistory(partial=!log)
@@ -38,12 +69,6 @@ end
 
 @inline linsys_op(x, A) = A*x
 
-"""
-The Induced Dimension Reduction method is a family of simple and fast Krylov
-subspace algorithms for solving large nonsymmetric linear systems. The idea
-behind the IDR(s) variant is to generate residuals that are in the nested
-subspaces of shrinking dimensions.
-"""
 function idrs_method!(log::ConvergenceHistory, X, op, args, C::T,
     s::Number, tol::Number, maxiter::Number; smoothing::Bool=false, verbose::Bool=false
     ) where {T}
@@ -190,86 +215,3 @@ function idrs_method!(log::ConvergenceHistory, X, op, args, C::T,
     setconv(log, 0<=normR<tol)
     X
 end
-
-#################
-# Documentation #
-#################
-
-let
-#Initialize parameters
-doc_call = """    idrs(A, b)
-"""
-doc!_call = """    idrs!(x, A, b)
-"""
-
-doc_msg = "Solve A*x=b using the induced dimension reduction method."
-doc!_msg = "Overwrite `x`.\n\n" * doc_msg
-
-doc_arg = ""
-doc!_arg = """`x`: initial guess, overwrite final estimation."""
-
-doc_version = (idrs, doc_call, doc_msg, doc_arg)
-doc!_version = (idrs!, doc!_call, doc!_msg, doc!_arg)
-
-i=0
-docstring = Vector(2)
-
-#Build docs
-for (func, call, msg, arg) in [doc_version, doc!_version]
-i+=1
-docstring[i] =  """
-$call
-
-$msg
-
-If `log` is set to `true` is given, method will output a tuple `x, ch`. Where
-`ch` is a `ConvergenceHistory` object. Otherwise it will only return `x`.
-
-# Arguments
-
-$arg
-
-`A`: linear operator.
-
-`b`: right hand side.
-
-## Keywords
-
-`Pl = 1`: left preconditioner of the method.
-
-`Pr = 1`: right preconditioner of the method.
-
-`tol::Real = sqrt(eps())`: stopping tolerance.
-
-`restart::Integer = min(20,length(b))`: maximum number of iterations per restart.
-
-`maxiter::Integer = min(20,length(b))`: maximum number of iterations.
-
-`verbose::Bool = false`: print method information.
-
-`log::Bool = false`: output an extra element of type `ConvergenceHistory`
-containing extra information of the method execution.
-
-# Output
-
-**if `log` is `false`**
-
-`x`: approximated solution.
-
-**if `log` is `true`**
-
-`x`: approximated solution.
-
-`ch`: convergence history.
-
-**ConvergenceHistory keys**
-
-`:tol` => `::Real`: stopping tolerance.
-
-`:resnom` => `::Vector`: residual norm at each iteration.
-"""
-end
-
-@doc docstring[1] -> idrs
-@doc docstring[2] -> idrs!
-end
diff --git a/src/lsmr.jl b/src/lsmr.jl
index 1c4e8f0b..4b07304e 100644
--- a/src/lsmr.jl
+++ b/src/lsmr.jl
@@ -2,12 +2,68 @@ export lsmr, lsmr!
 
 using Base.LinAlg
 
-####################
-# API method calls #
-####################
+"""
+    lsmr(A, b; kwrags...) -> x, [history]
 
+Same as [`lsmr!`](@ref), but allocates a solution vector `x` initialized with zeros.
+"""
 lsmr(A, b; kwargs...) = lsmr!(zerox(A, b), A, b; kwargs...)
 
+"""
+    lsmr!(x, A, b; kwargs...) -> x, [history]
+
+Minimizes ``\|Ax - b\|^2 + \|λx\|^2`` in the Euclidean norm. If multiple solutions
+exists the minimum norm solution is returned.
+
+The method is based on the Golub-Kahan bidiagonalization process. It is 
+algebraically equivalent to applying MINRES to the normal equations 
+``(A^*A + λ^2I)x = A^*b``, but has better numerical properties, 
+especially if ``A`` is ill-conditioned.
+
+# Arguments
+- `x`: Initial guess, will be updated in-place;
+- `A`: linear operator;
+- `b`: right-hand side.
+
+## Keywords
+
+- `λ::Number = 0`: lambda.
+- `atol::Number = 1e-6`, `btol::Number = 1e-6`: stopping tolerances. If both are
+  1.0e-9 (say), the final residual norm should be accurate to about 9 digits.
+  (The final `x` will usually have fewer correct digits,
+  depending on `cond(A)` and the size of damp).
+- `conlim::Number = 1e8`: stopping tolerance. `lsmr` terminates if an estimate
+  of `cond(A)` exceeds conlim.  For compatible systems Ax = b,
+  conlim could be as large as 1.0e+12 (say).  For least-squares
+  problems, conlim should be less than 1.0e+8.
+  Maximum precision can be obtained by setting
+- `atol` = `btol` = `conlim` = zero, but the number of iterations
+  may then be excessive.
+- `maxiter::Integer = min(20,length(b))`: maximum number of iterations.
+- `log::Bool`: keep track of the residual norm in each iteration;
+- `verbose::Bool`: print convergence information during the iterations.
+
+# Return values
+
+**if `log` is `false`**
+
+- `x`: approximated solution.
+
+**if `log` is `true`**
+
+- `x`: approximated solution.
+- `ch`: convergence history.
+
+**ConvergenceHistory keys**
+
+- `:atol` => `::Real`: atol stopping tolerance.
+- `:btol` => `::Real`: btol stopping tolerance.
+- `:ctol` => `::Real`: ctol stopping tolerance.
+- `:anorm` => `::Real`: anorm.
+- `:rnorm` => `::Real`: rnorm.
+- `:cnorm` => `::Real`: cnorm.
+- `:resnom` => `::Vector`: residual norm at each iteration.
+"""
 function lsmr!(x, A, b;
     maxiter::Integer = max(size(A,1), size(A,2)),
     log::Bool=false, kwargs...
@@ -232,108 +288,3 @@ for (name, symbol) in ((:Ac_mul_B!, 'T'), (:A_mul_B!, 'N'))
         end
     end
 end
-
-#################
-# Documentation #
-#################
-
-let
-#Initialize parameters
-doc_call = """    lsmr(A, b)
-"""
-doc!_call = """    lsmr!(x, A, b)
-"""
-
-doc_msg = "Minimize ||Ax-b||^2 + λ^2 ||x||^2 for A*x=b.\n"
-doc!_msg = "Overwrite `x`.\n\n" * doc_msg
-
-doc_arg = ""
-doc!_arg = """`x`: initial guess, overwrite final estimation."""
-
-doc_version = (lsmr, doc_call, doc_msg, doc_arg)
-doc!_version = (lsmr!, doc!_call, doc!_msg, doc!_arg)
-
-i=0
-docstring = Vector(2)
-
-#Build docs
-for (func, call, msg, arg) in [doc_version, doc!_version]
-i+=1
-docstring[i] =  """
-$call
-
-$msg
-
-The method is based on the Golub-Kahan bidiagonalization process. It is
-algebraically equivalent to applying MINRES to the normal equation (ATA+λ2I)x=ATb,
-but has better numerical properties, especially if A is ill-conditioned.
-
-If `log` is set to `true` is given, method will output a tuple `x, ch`. Where
-`ch` is a `ConvergenceHistory` object. Otherwise it will only return `x`.
-
-# Arguments
-
-$arg
-
-`A`: linear operator.
-
-`b`: right hand side.
-
-## Keywords
-
-`λ::Number = 0`: lambda.
-
-`atol::Number = 1e-6`, `btol::Number = 1e-6`: stopping tolerances. If both are
-1.0e-9 (say), the final residual norm should be accurate to about 9 digits.
-(The final `x` will usually have fewer correct digits,
-depending on `cond(A)` and the size of damp).
-
-`conlim::Number = 1e8`: stopping tolerance.  `lsmr` terminates if an estimate
-of `cond(A)` exceeds conlim.  For compatible systems Ax = b,
-conlim could be as large as 1.0e+12 (say).  For least-squares
-problems, conlim should be less than 1.0e+8.
-Maximum precision can be obtained by setting
-`atol` = `btol` = `conlim` = zero, but the number of iterations
-may then be excessive.
-
-`maxiter::Integer = min(20,length(b))`: maximum number of iterations.
-
-`verbose::Bool = false`: print method information.
-
-`log::Bool = false`: output an extra element of type `ConvergenceHistory`
-containing extra information of the method execution.
-
-# Output
-
-**if `log` is `false`**
-
-`x`: approximated solution.
-
-**if `log` is `true`**
-
-`x`: approximated solution.
-
-`ch`: convergence history.
-
-**ConvergenceHistory keys**
-
-`:atol` => `::Real`: atol stopping tolerance.
-
-`:btol` => `::Real`: btol stopping tolerance.
-
-`:ctol` => `::Real`: ctol stopping tolerance.
-
-`:anorm` => `::Real`: anorm.
-
-`:rnorm` => `::Real`: rnorm.
-
-`:cnorm` => `::Real`: cnorm.
-
-`:resnom` => `::Vector`: residual norm at each iteration.
-
-"""
-end
-
-@doc docstring[1] -> lsmr
-@doc docstring[2] -> lsmr!
-end
diff --git a/src/lsqr.jl b/src/lsqr.jl
index 48127e3a..92383cea 100644
--- a/src/lsqr.jl
+++ b/src/lsqr.jl
@@ -1,11 +1,68 @@
 export lsqr, lsqr!
 
-####################
-# API method calls #
-####################
+"""
+    lsqr(A, b; kwrags...) -> x, [history]
 
+Same as [`lsqr!`](@ref), but allocates a solution vector `x` initialized with zeros.
+"""
 lsqr(A, b; kwargs...) = lsqr!(zerox(A, b), A, b; kwargs...)
 
+"""
+    lsqr!(x, A, b; kwargs...) -> x, [history]
+
+Minimizes ``\|Ax - b\|^2 + \|damp*x\|^2`` in the Euclidean norm. If multiple solutions
+exists returns the minimal norm solution.
+
+The method is based on the Golub-Kahan bidiagonalization process.
+It is algebraically equivalent to applying CG to the normal equations 
+``(A^*A + λ^2I)x = A^*b`` but has better numerical properties, 
+especially if A is ill-conditioned.
+
+# Arguments
+- `x`: Initial guess, will be updated in-place;
+- `A`: linear operator;
+- `b`: right-hand side.
+
+## Keywords
+
+- `damp::Number = 0`: damping parameter.
+- `atol::Number = 1e-6`, `btol::Number = 1e-6`: stopping tolerances. If both are
+  1.0e-9 (say), the final residual norm should be accurate to about 9 digits.
+  (The final `x` will usually have fewer correct digits,
+  depending on `cond(A)` and the size of damp).
+- `conlim::Number = 1e8`: stopping tolerance.  `lsmr` terminates if an estimate
+  of `cond(A)` exceeds conlim.  For compatible systems Ax = b,
+  conlim could be as large as 1.0e+12 (say).  For least-squares
+  problems, conlim should be less than 1.0e+8.
+  Maximum precision can be obtained by setting
+  `atol` = `btol` = `conlim` = zero, but the number of iterations
+  may then be excessive.
+- `maxiter::Integer = min(20,length(b))`: maximum number of iterations.
+- `verbose::Bool = false`: print method information.
+- `log::Bool = false`: output an extra element of type `ConvergenceHistory`
+  containing extra information of the method execution.
+
+# Return values
+
+**if `log` is `false`**
+
+- `x`: approximated solution.
+
+**if `log` is `true`**
+
+- `x`: approximated solution.
+- `ch`: convergence history.
+
+**ConvergenceHistory keys**
+
+- `:atol` => `::Real`: atol stopping tolerance.
+- `:btol` => `::Real`: btol stopping tolerance.
+- `:ctol` => `::Real`: ctol stopping tolerance.
+- `:anorm` => `::Real`: anorm.
+- `:rnorm` => `::Real`: rnorm.
+- `:cnorm` => `::Real`: cnorm.
+- `:resnom` => `::Vector`: residual norm at each iteration.
+"""
 function lsqr!(x, A, b;
     maxiter::Int=max(size(A,1), size(A,2)), log::Bool=false,
     kwargs...
@@ -217,111 +274,3 @@ function lsqr_method!(log::ConvergenceHistory, x, A, b;
     verbose && @printf("\n")
     x
 end
-
-#################
-# Documentation #
-#################
-
-let
-#Initialize parameters
-doc_call = """    lsqr(A, b)
-"""
-doc!_call = """    lsqr!(x, A, b)
-"""
-
-doc_msg = """LSQR solves Ax = b or min ||b - Ax||^2 if damp = 0,
-or   min ||(b) - (  A   )x||   otherwise.
-         ||(0)   (damp*I) ||^2.
-"""
-doc!_msg = "Overwrite `x`.\n\n" * doc_msg
-
-doc_arg = ""
-doc!_arg = """`x`: initial guess, overwrite final estimation."""
-
-doc_version = (lsqr, doc_call, doc_msg, doc_arg)
-doc!_version = (lsqr!, doc!_call, doc!_msg, doc!_arg)
-
-i=0
-docstring = Vector(2)
-
-#Build docs
-for (func, call, msg, arg) in [doc_version, doc!_version]
-i+=1
-docstring[i] = """
-$call
-
-$msg
-
-The method is based on the Golub-Kahan bidiagonalization process.
-It is algebraically equivalent to applying CG to the normal equation (ATA+λ2I)x=ATb,
-but has better numerical properties, especially if A is ill-conditioned.
-
-If `log` is set to `true` is given, method will output a tuple `x, ch`. Where
-`ch` is a `ConvergenceHistory` object. Otherwise it will only return `x`.
-
-# Arguments
-
-$arg
-
-`A`: linear operator.
-
-`b`: right hand side.
-
-## Keywords
-
-`damp::Number = 0`: damping parameter.
-
-`atol::Number = 1e-6`, `btol::Number = 1e-6`: stopping tolerances. If both are
-1.0e-9 (say), the final residual norm should be accurate to about 9 digits.
-(The final `x` will usually have fewer correct digits,
-depending on `cond(A)` and the size of damp).
-
-`conlim::Number = 1e8`: stopping tolerance.  `lsmr` terminates if an estimate
-of `cond(A)` exceeds conlim.  For compatible systems Ax = b,
-conlim could be as large as 1.0e+12 (say).  For least-squares
-problems, conlim should be less than 1.0e+8.
-Maximum precision can be obtained by setting
-`atol` = `btol` = `conlim` = zero, but the number of iterations
-may then be excessive.
-
-`maxiter::Integer = min(20,length(b))`: maximum number of iterations.
-
-`verbose::Bool = false`: print method information.
-
-`log::Bool = false`: output an extra element of type `ConvergenceHistory`
-containing extra information of the method execution.
-
-# Output
-
-**if `log` is `false`**
-
-`x`: approximated solution.
-
-**if `log` is `true`**
-
-`x`: approximated solution.
-
-`ch`: convergence history.
-
-**ConvergenceHistory keys**
-
-`:atol` => `::Real`: atol stopping tolerance.
-
-`:btol` => `::Real`: btol stopping tolerance.
-
-`:ctol` => `::Real`: ctol stopping tolerance.
-
-`:anorm` => `::Real`: anorm.
-
-`:rnorm` => `::Real`: rnorm.
-
-`:cnorm` => `::Real`: cnorm.
-
-`:resnom` => `::Vector`: residual norm at each iteration.
-
-"""
-end
-
-@doc docstring[1] -> lsqr
-@doc docstring[2] -> lsqr!
-end
diff --git a/src/minres.jl b/src/minres.jl
index 603756be..d9bf99af 100644
--- a/src/minres.jl
+++ b/src/minres.jl
@@ -1,25 +1,8 @@
-export minres_iterable, minres
+export minres_iterable, minres, minres!
 
 import Base.LinAlg: BLAS.axpy!, givensAlgorithm
 import Base: start, next, done
 
-"""
-MINRES is full GMRES for Hermitian matrices, finding
-xₖ := x₀ + Vₖyₖ, where Vₖ is an orthonormal basis for
-the Krylov subspace K(A, b - Ax₀). Since
-|rₖ| = |b - Axₖ| = |r₀ - Vₖ₊₁Hₖyₖ| = | |r₀|e₁ - Hₖyₖ|, we solve
-Hₖyₖ = |r₀|e₁ in least-square sense. As the Hessenberg matrix
-is tridiagonal, its QR decomp Hₖ = QₖRₖ has Rₖ with only 3 diagonals,
-easily obtained with Givens rotations. The least-squares problem is
-solved as Hₖ'Hₖyₖ = Hₖ'|r₀|e₁ => yₖ = inv(Rₖ) Qₖ'|r₀|e₁, so
-xₖ := x₀ + [Vₖ inv(Rₖ)] [Qₖ'|r₀|e₁]. Now the main difference with GMRES
-is the placement of the brackets. MINRES computes Wₖ = Vₖ inv(Rₖ) via 3-term
-recurrences using only the last column of R, and computes last 
-two terms of Qₖ'|r₀|e₁ as well.
-The active column of the Hessenberg matrix H is updated in place to form the
-active column of R. Note that for Hermitian matrices the H matrix is purely
-real, while for skew-Hermitian matrices its diagonal is purely imaginary.
-"""
 mutable struct MINRESIterable{matT, vecT <: DenseVector, smallVecT <: DenseVector, rotT <: Number, realT <: Real}
     A::matT
     skew_hermitian::Bool
@@ -171,6 +154,41 @@ function next(m::MINRESIterable, iteration::Int)
     m.resnorm, iteration + 1
 end
 
+"""
+    minres!(x, A, b; kwargs...) -> x, [history]
+
+Solve Ax = b for (skew-)Hermitian matrices A using MINRES.
+
+# Arguments
+
+- `x`: initial guess, will be updated in-place;
+- `A`: linear operator;
+- `b`: right-hand side.
+
+## Keywords
+
+- `initially_zero::Bool = false`: if `true` assumes that `iszero(x)` so that one 
+  matrix-vector product can be saved when computing the initial 
+  residual vector;
+- `skew_hermitian::Bool = false`: if `true` assumes that `A` is skew-symmetric or skew-Hermitian;
+- `tol`: tolerance for stopping condition `|r_k| / |r_0| ≤ tol`. Note that the residual is computed only approximately;
+- `maxiter::Int`: maximum number of inner iterations of GMRES;
+- `Pl`: left preconditioner;
+- `Pr`: right preconditioner;
+- `log::Bool = false`: keep track of the residual norm in each iteration;
+- `verbose::Bool = false`: print convergence information during the iterations.
+
+# Return values
+
+**if `log` is `false`**
+
+- `x`: approximate solution.
+
+**if `log` is `true`**
+
+- `x`: approximate solution;
+- `history`: convergence history.
+"""
 function minres!(x, A, b; 
     skew_hermitian::Bool = false,
     verbose::Bool = false,
@@ -209,74 +227,9 @@ function minres!(x, A, b;
     log ? (iterable.x, history) : iterable.x
 end
 
-minres(A, b; kwargs...) = minres!(zerox(A, b), A, b; initially_zero = true, kwargs...)
-
-#################
-# Documentation #
-#################
-
-let
-doc_call = "minres(A, b)"
-doc!_call = "minres!(x, A, b)"
-
-doc_msg = "Using initial guess zeros(b)."
-doc!_msg = "Overwrites `x`."
-
-doc_arg = ""
-doc!_arg = """`x`: initial guess, overwrite final approximation."""
-
-doc_version = (doc_call, doc_msg, doc_arg)
-doc!_version = (doc!_call, doc!_msg, doc!_arg)
-
-docstring = String[]
-
-#Build docs
-for (call, msg, arg) in (doc_version, doc!_version) #Start
-    push!(docstring, 
 """
-$call
-
-Solve A*x = b for (skew-)Hermitian matrices A using MINRES. The method is mathematically 
-equivalent to unrestarted GMRES, but exploits symmetry of A, resulting in short
-recurrences requiring only 6 vectors of storage. MINRES might be slightly less
-stable than full GMRES.
-
-$msg
-
-# Arguments
-
-$arg
-
-`A`: linear operator.
-
-`b`: right hand side (vector).
-
-## Keywords
-
-`tol::Real = sqrt(eps(real(eltype(b))))`: tolerance for stopping condition 
-`|r_k| / |r_0| ≤ tol`. Note that the residual is computed only approximately.
-
-`maxiter::Int = min(30, size(A, 1))`: maximum number of iterations.
-
-`verbose::Bool = false` output during the iterations
+    minres(A, b; kwargs...) -> x, [history]
 
-`log::Bool = false` enables logging, see **Output**.
-
-# Output
-
-**if `log` is `false`**
-
-`x`: approximated solution.
-
-**if `log` is `true`**
-
-`x`: approximated solution.
-
-`ch`: convergence history.
+Same as [`minres!`](@ref), but allocates a solution vector `x` initialized with zeros.
 """
-    )
-end
-
-@doc docstring[1] -> minres
-@doc docstring[2] -> minres!
-end
\ No newline at end of file
+minres(A, b; kwargs...) = minres!(zerox(A, b), A, b; initially_zero = true, kwargs...)
diff --git a/src/rlinalg.jl b/src/rlinalg.jl
index 8ec1a69b..fe64eeb5 100644
--- a/src/rlinalg.jl
+++ b/src/rlinalg.jl
@@ -15,26 +15,24 @@ Compute randomized gaussian matrix normalized by column.
 
 # Arguments
 
-`el::Type`: element type.
-
-`m::Int`: number of rows.
-
-`n::Int`: number of columns.
+- `el::Type`: element type.
+- `m::Int`: number of rows.
+- `n::Int`: number of columns.
 
 ## Keywords
 
-`normalize::Bool = true`: normalize output.
+- `normalize::Bool = true`: normalize output.
 
 # Output
 
 **Without `n`**
 
-`Ω`: vector containing Gaussian random numbers of type `el`.
+- `Ω`: vector containing Gaussian random numbers of type `el`.
 
 **With `n`**
 
-`Ω`: matrix of dimensions `m` x `n` containing Gaussian random numbers of
-type `el`.
+- `Ω`: matrix of dimensions `m` x `n` containing Gaussian random numbers of
+  type `el`.
 """
 function randnn(el, m::Int, normalize::Bool=true)
     if el <: Real
@@ -69,26 +67,23 @@ Compute a probabilistic upper bound on the norm of a matrix `A`.
 
 # Arguments
 
-`A`: matrix whose norm to estimate.
-
-`mvps::Int`: number of matrix-vector products to compute.
+- `A`: matrix whose norm to estimate.
+- `mvps::Int`: number of matrix-vector products to compute.
 
 ## Keywords
 
-`p::Real=0.05`: probability of upper bound failing.
+- `p::Real=0.05`: probability of upper bound failing.
 
 # Output
 
 Estimate of ‖A‖.
 
-# See also
-
-see [`rnorms`](@ref) for a different estimator that uses premultiplying by both
-`A` and `A'`.
+See also [`rnorms`](@ref) for a different estimator that uses 
+premultiplying by both `A` and `A'`.
 
 # References
 
-\\cite[Lemma 4.1]{Halko2011}
+Lemma 4.1 of Halko2011
 """
 function rnorm(A, r::Int, p::Real=0.05)
     @assert 0<p≤1
@@ -119,42 +114,24 @@ with probability greater than `1 - p`, where `p = 4\\sqrt(n/(iters-1)) α^(-2ite
 
 # Arguments
 
-`A`: matrix whose norm to estimate.
-
-`iters::Int = 1`: mumber of power iterations to perform.
+- `A`: matrix whose norm to estimate.
+- `iters::Int = 1`: mumber of power iterations to perform.
 
 ## Keywords
 
-`p::Real = 0.05`: probability of upper bound failing.
-
-`At = A'`: Transpose of `A`.
+- `p::Real = 0.05`: probability of upper bound failing.
+- `At = A'`: Transpose of `A`.
 
 # Output
 
 Estimate of ‖A‖.
 
-# See also
-
-see [`rnorm`](@ref) for a different estimator that does not require
+See also [`rnorm`](@ref) for a different estimator that does not require
 premultiplying by `A'`
 
 # References
 
-Appendix of \\cite{Liberty2007}.
-
-```bibtex
-@article{Liberty2007,
-    authors = {Edo Liberty and Franco Woolfe and Per-Gunnar Martinsson
-    and Vladimir Rokhlin and Mark Tygert},
-    title = {Randomized algorithms for the low-rank approximation of matrices},
-    journal = {Proceedings of the National Academy of Sciences},
-    volume = {104},
-    issue = {51},
-    year = {2007},
-    pages = {20167--20172},
-    doi  = {10.1073/pnas.0709640104}
-}
-```
+Appendix of [^Liberty2007].
 """
 function rnorms(A, j::Int=1, p::Real=0.05; At = A')
     @assert 0<p≤1
@@ -176,14 +153,13 @@ Estimate matrix condition number randomly.
 
 # Arguments
 
-`A`: matrix whose condition number to estimate. Must be square and
+- `A`: matrix whose condition number to estimate. Must be square and
 support premultiply (`A*⋅`) and solve (`A\\⋅`).
-
-`iters::Int = 1`: number of power iterations to run.
+- `iters::Int = 1`: number of power iterations to run.
 
 ## Keywords
 
-`p::Real = 0.05`: probability that estimate fails to hold as an upper bound.
+- `p::Real = 0.05`: probability that estimate fails to hold as an upper bound.
 
 # Output
 
@@ -191,7 +167,7 @@ Interval `(x, y)` which contains `κ(A)` with probability `1 - p`.
 
 # Implementation note
 
-\\cite{Dixon1983} originally describes this as a computation that
+[^Dixon1983] originally describes this as a computation that
 can be done by computing the necessary number of power iterations given p
 and the desired accuracy parameter `θ=y/x`. However, these bounds were only
 derived under the assumptions of exact arithmetic. Empirically, `iters≥4` has
@@ -199,23 +175,6 @@ been seen to result in incorrect results in that the computed interval does
 not contain the true condition number. This implemention therefore makes `iters`
 an explicitly user-controllable parameter from which to infer the accuracy
 parameter and hence the interval containing `κ(A)`.
-
-# References
-
-\\cite[Theorem 2]{Dixon1983}
-
-```bibtex
-@article{Dixon1983,
-    author = {Dixon, John D},
-    doi = {10.1137/0720053},
-    journal = {SIAM Journal on Numerical Analysis},
-    number = {4},
-    pages = {812--814},
-    title = {Estimating Extremal Eigenvalues and Condition Numbers of
-	Matrices},
-    volume = {20},
-    year = {1983}
-}
 ```
 """
 function rcond(A, k::Int=1, p::Real=0.05)
@@ -242,14 +201,13 @@ Estimate maximal eigenvalue randomly.
 
 # Arguments
 
-`A`: Matrix whose maximal eigenvalue to estimate.
+- `A`: Matrix whose maximal eigenvalue to estimate.
 Must be square and support premultiply (`A*⋅`).
-
-`iters::Int=1`: Number of power iterations to run. (Recommended: `iters` ≤ 3)
+- `iters::Int=1`: Number of power iterations to run. (Recommended: `iters` ≤ 3)
 
 ## Keywords
 
-`p::Real=0.05`: Probability that estimate fails to hold as an upper bound.
+- `p::Real=0.05`: Probability that estimate fails to hold as an upper bound.
 
 # Output
 
@@ -258,7 +216,7 @@ probability `1 - p`.
 
 # References
 
-\\cite[Corollary of Theorem 1]{Dixon1983}.
+Corollary of Theorem 1 in [^Dixon1983]
 """
 function reigmax(A, k::Int=1, p::Real=0.05)
     @assert 0<p≤1
@@ -280,14 +238,13 @@ Estimate minimal eigenvalue randomly.
 
 # Arguments
 
-`A`: Matrix whose maximal eigenvalue to estimate.
+- `A`: Matrix whose maximal eigenvalue to estimate.
 Must be square and support premultiply (`A*⋅`).
-
-`iters::Int=1`: Number of power iterations to run. (Recommended: `iters` ≤ 3)
+- `iters::Int=1`: Number of power iterations to run. (Recommended: `iters` ≤ 3)
 
 ## Keywords
 
-`p::Real=0.05`: Probability that estimate fails to hold as an upper bound.
+- `p::Real=0.05`: Probability that estimate fails to hold as an upper bound.
 
 # Output
 
@@ -296,7 +253,7 @@ probability `1 - p`.
 
 # References
 
-\\cite[Corollary of Theorem 1]{Dixon1983}.
+Corollary of Theorem 1 in [^Dixon1983]
 """
 function reigmin(A, k::Int=1, p::Real=0.05)
     @assert 0<p≤1
@@ -337,17 +294,16 @@ Apply a subsampled random Fourier transform to the columns of `A`.
 
 # Arguments
 
-`A`: matrix to transform.
-
-`Ω::srft`: subsampled random Fourier transform.
+- `A`: matrix to transform.
+- `Ω::srft`: subsampled random Fourier transform.
 
 # Output
 
-`B`: A matrix of dimensions size(A,1) x Ω.l
+- `B`: A matrix of dimensions size(A,1) x Ω.l
 
 # References
 
-\\[Equation 4.6]{Halko2011}
+Equation 4.6 of [^Halko2011].
 """
 #Define two methods here to avoid method ambiguity with f::Function*b::Any
 *(A::Function, Ω::srft) = function *(A, Ω::srft)
diff --git a/src/rsvd.jl b/src/rsvd.jl
index 21e17406..f8e07d52 100644
--- a/src/rsvd.jl
+++ b/src/rsvd.jl
@@ -4,7 +4,7 @@
 
 # This file provides a rudimentary implementation of the randomized singular
 # value decomposition and spectral (eigen-) decomposition as described in
-# \cite{Halko2011}.
+# [^Halko2011].
 
 import Base.LinAlg: Eigen, SVD
 
@@ -28,42 +28,25 @@ algorithm.
 
 `::SVD`: singular value decomposition.
 
-# Warning
-
-This variant of the randomized singular value decomposition is the most
-commonly found implementation but is not recommended for accurate
-computations, as it often has trouble finding the `n` largest singular pairs,
-but rather finds `n` large singular pairs which may not necessarily be the
-largest.
+!!! warning "Accuracy"
+    This variant of the randomized singular value decomposition is the most
+    commonly found implementation but is not recommended for accurate
+    computations, as it often has trouble finding the `n` largest singular pairs,
+    but rather finds `n` large singular pairs which may not necessarily be the
+    largest.
 
 # Implementation note
 
 This function calls `rrange`, which uses naive randomized rangefinding to
 compute a basis for a subspace of dimension `n` (Algorithm 4.1 of
-\\cite{Halko2011}), followed by `svdfact_restricted()`, which computes the
+[^Halko2011]), followed by `svdfact_restricted()`, which computes the
 exact SVD factorization on the restriction of `A` to this randomly selected
-subspace (Algorithm 5.1 of \\cite{Halko2011}).
+subspace (Algorithm 5.1 of [^Halko2011]).
 
 Alternatively, you can mix and match your own randomized algorithm using
 any of the randomized range finding algorithms to find a suitable subspace
 and feeding the result to one of the routines that computes the `SVD`
 restricted to that subspace.
-
-# References
-
-```bibtex
-@article{Halko2011,
-    author = {Halko, N and Martinsson, P G and Tropp, J A},
-    doi = {10.1137/090771806},
-    journal = {SIAM Review},
-    month = jan,
-    number = {2},
-    pages = {217--288},
-    title = {Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions},
-    volume = {53},
-    year = {2011}
-}
-```
 """
 function rsvdfact(A, n::Int, p::Int=0)
     Q = rrange(A, n+p)
@@ -78,52 +61,33 @@ algorithm.
 
 # Arguments
 
-`A`: input matrix.
-
-`n::Int`: number of singular value/vector pairs to find.
-
-`p::Int=0`: number of extra vectors to include in computation.
+- `A`: input matrix;
+- `n::Int`: number of singular value/vector pairs to find;
+- `p::Int=0`: number of extra vectors to include in computation.
 
 # Output
 
-`::Vector`: estimated singular values of `A`.
-
-# Warning
+- `::Vector`: estimated singular values of `A`.
 
-This variant of the randomized singular value decomposition is the most
-commonly found implementation but is not recommended for accurate
-computations, as it often has trouble finding the `n` largest singular pairs,
-but rather finds `n` large singular pairs which may not necessarily be the
-largest.
+!!! warning "Accuracy"
+    This variant of the randomized singular value decomposition is the most
+    commonly found implementation but is not recommended for accurate
+    computations, as it often has trouble finding the `n` largest singular pairs,
+    but rather finds `n` large singular pairs which may not necessarily be the
+    largest.
 
 # Implementation note
 
 This function calls `rrange`, which uses naive randomized rangefinding to
 compute a basis for a subspace of dimension `n` (Algorithm 4.1 of
-\\cite{Halko2011}), followed by `svdfact_restricted()`, which computes the
+[^Halko2011]), followed by `svdfact_restricted()`, which computes the
 exact SVD factorization on the restriction of `A` to this randomly selected
-subspace (Algorithm 5.1 of \\cite{Halko2011}).
+subspace (Algorithm 5.1 of [^Halko2011]).
 
 Alternatively, you can mix and match your own randomized algorithm using
 any of the randomized range finding algorithms to find a suitable subspace
 and feeding the result to one of the routines that computes the `SVD`
 restricted to that subspace.
-
-# References
-
-```bibtex
-@article{Halko2011,
-    author = {Halko, N and Martinsson, P G and Tropp, J A},
-    doi = {10.1137/090771806},
-    journal = {SIAM Review},
-    month = jan,
-    number = {2},
-    pages = {217--288},
-    title = {Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions},
-    volume = {53},
-    year = {2011}
-}
-```
 """
 function rsvdvals(A, n::Int, p::Int=0)
     Q = rrange(A, n+p)
@@ -138,22 +102,20 @@ naive randomized rangefinding.
 
 # Arguments
 
-`A`: Input matrix. Must support `size(A)` and premultiply.
-
-`r::Int = 0`: The number of basis vectors to compute.
+- `A`: Input matrix. Must support `size(A)` and premultiply.
+- `r::Int = 0`: The number of basis vectors to compute.
 
 # Output
 
-`::Matrix`: matrix of dimension `size(A,1) x r` containing the basis
+- `::Matrix`: matrix of dimension `size(A,1) x r` containing the basis
 vectors of the computed subspace of `A`.
 
-# Warning
-
-The Reference explicitly discourages using this algorithm.
+!!! warning "Warning"
+    The Reference explicitly discourages using this algorithm.
 
 # Implementation note
 
-Whereas \\cite{Halko2011} recommends classical Gram-Schmidt with double
+Whereas [^Halko2011] recommends classical Gram-Schmidt with double
 reorthogonalization, we instead compute the basis with `qrfact()`, which
 for dense `A` computes the QR factorization using Householder reflectors.
 """
@@ -181,15 +143,13 @@ a threshold `ϵ`.
 
 # Arguments
 
-`A`: input matrix. Must support `size(A)` and premultiply.
-
-`r::Integer`: number of basis vectors to compute.
-
-`ϵ::Real = eps()`: threshold to determine adaptive fitting.
+- `A`: input matrix. Must support `size(A)` and premultiply.
+- `r::Integer`: number of basis vectors to compute.
+- `ϵ::Real = eps()`: threshold to determine adaptive fitting.
 
 ## Keywords
 
-`maxiter::Int = 10`: maximum number of iterations to run.
+- `maxiter::Int = 10`: maximum number of iterations to run.
 
 # Output
 
@@ -198,7 +158,7 @@ vectors of the computed subspace of `A`.
 
 # References
 
-Algorithm 4.2 of \\cite{Halko2011}
+Algorithm 4.2 of [^Halko2011]
 """
 function rrange_adaptive(A, r::Integer, ϵ::Real=eps(); maxiter::Int=10)
     m, n = size(A)
@@ -239,23 +199,20 @@ randomized rangefinding by subspace iteration.
 
 # Arguments
 
-`A`: input matrix. Must support `size(A)` and premultiply.
-
-`l::Int`: number of basis vectors to compute.
+- `A`: input matrix. Must support `size(A)` and premultiply.
+- `l::Int`: number of basis vectors to compute.
 
 ## Keywords
 
-`At = A'`: transpose of `A`.
-
-`q::Int = 0`: number of subspace iterations.
-
-`p` : oversampling parameter. The number of extra basis vectors to use
-in the computation, which get discarded at the end.
+- `At = A'`: transpose of `A`.
+- `q::Int = 0`: number of subspace iterations.
+- `p` : oversampling parameter. The number of extra basis vectors to use
+  in the computation, which get discarded at the end.
 
 # Output
 
-`::Matrix`: A dense matrix of dimension `size(A,1) x l` containing the basis
-vectors of the computed subspace of `A`.
+- `::Matrix`: A dense matrix of dimension `size(A,1) x l` containing the basis
+  vectors of the computed subspace of `A`.
 
 # Implementation note
 
@@ -265,7 +222,7 @@ for dense A computes the QR factorization using Householder reflectors.
 
 # References
 
-Algorithm 4.4 of \\cite{Halko2011}
+Algorithm 4.4 of [^Halko2011]
 """
 function rrange_si(A, l::Int; At=A', q::Int=0)
     basis=x->full(qrfact(x)[:Q])
@@ -292,17 +249,15 @@ naive randomized rangefinding using stochastic randomized Fourier transforms.
 
 # Arguments
 
-`A` : input matrix. Must support `size(A)` and premultiply.
-
-`l::Int` : number of basis vectors to compute.
-
-`p::Int = 0` : oversampling parameter. The number of extra basis vectors to use
-in the computation, which get discarded at the end.
+- `A` : input matrix. Must support `size(A)` and premultiply.
+- `l::Int` : number of basis vectors to compute.
+- `p::Int = 0` : oversampling parameter. The number of extra basis vectors to use
+  in the computation, which get discarded at the end.
 
 # Output
 
-`::Matrix`: matrix of dimension `size(A,1)` x `l` containing the basis
-vectors of the computed subspace of `A`.
+- `::Matrix`: matrix of dimension `size(A,1)` x `l` containing the basis
+  vectors of the computed subspace of `A`.
 
 # Implementation note
 
@@ -312,7 +267,7 @@ for dense `A` computes the QR factorization using Householder reflectors.
 
 # References
 
-Algorithm 4.5 of \\cite{Halko2011}
+Algorithm 4.5 of [^Halko2011]
 """
 function rrange_f(A, l::Int)
     n = size(A, 2)
@@ -329,10 +284,9 @@ using exact projection.
 
 # Arguments
 
-`A`: input matrix. Must support postmultiply.
-
-`Q`: matrix containing basis vectors of the subspace whose restriction to is
-desired.
+- `A`: input matrix. Must support postmultiply.
+- `Q`: matrix containing basis vectors of the subspace whose restriction to is
+  desired.
 
 # Output
 
@@ -340,7 +294,7 @@ desired.
 
 # References
 
-Algorithm 5.1 of \\cite{Halko2011}
+Algorithm 5.1 of [^Halko2011]
 """
 function svdfact_restricted(A, Q, n::Int)
     B=Q'A
@@ -356,18 +310,17 @@ using exact projection.
 
 # Arguments
 
-`A`: input matrix. Must support postmultiply.
-
-`Q`: matrix containing basis vectors of the subspace whose restriction to is
-desired.
+- `A`: input matrix. Must support postmultiply.
+- `Q`: matrix containing basis vectors of the subspace whose restriction to is
+  desired.
 
 # Output
 
-`::Vector`: estimated singular values of `A`.
+- `::Vector`: estimated singular values of `A`.
 
 # References
 
-Algorithm 5.1 of \\cite{Halko2011}
+Algorithm 5.1 of [^Halko201]
 """
 function svdvals_restricted(A, Q, n::Int)
     B=Q'A
@@ -380,19 +333,18 @@ end
 Compute the SVD factorization of `A` restricted to the subspace spanned by `Q`
 using row extraction.
 
-*Note:* \\cite[Remark 5.2]{Halko2011} recommends input of `Q` of the form `Q=A*Ω`
+*Note:* Remark 5.2 of [^Halko2011] recommends input of `Q` of the form `Q=A*Ω`
 where `Ω` is a sample computed by `randn(n,l)` or even `srft(l)`.
 
 # Arguments
 
-`A`: input matrix. Must support postmultiply
-
-`Q`: matrix containing basis vectors of the subspace whose restriction to is
-desired. Need not be orthogonal or normalized.
+- `A`: input matrix. Must support postmultiply
+- `Q`: matrix containing basis vectors of the subspace whose restriction to is
+  desired. Need not be orthogonal or normalized.
 
 # Output
 
-`::SVD`: singular value decomposition.
+- `::SVD`: singular value decomposition.
 
 # See also
 
@@ -401,7 +353,7 @@ interpolative decomposition `idfact`.
 
 # References
 
-Algorithm 5.2 of \\cite{Halko2011}
+Algorithm 5.2 of [^Halko2011]
 """
 function svdfact_re(A, Q)
     F = idfact(Q)
@@ -420,18 +372,17 @@ spanned by `Q` using row extraction.
 
 # Arguments
 
-`A::Hermitian`: input matrix. Must be `Hermitian` and support pre- and post-multiply.
-
-`Q`: orthonormal matrix containing basis vectors of the subspace whose
-restriction to is desired.
+- `A::Hermitian`: input matrix. Must be `Hermitian` and support pre- and post-multiply.
+- `Q`: orthonormal matrix containing basis vectors of the subspace whose
+  restriction to is desired.
 
 # Output
 
-`::Base.LinAlg.Eigen`: eigen factorization.
+- `::Base.LinAlg.Eigen`: eigen factorization.
 
 # References
 
-Algorithm 5.3 of \\cite{Halko2011}
+Algorithm 5.3 of [^Halko2011]
 """
 function eigfact_restricted(A::Hermitian, Q)
     B = Q'A*Q
@@ -445,19 +396,18 @@ end
 Compute the spectral (`Eigen`) factorization of `A` restricted to the subspace
 spanned by `Q` using row extraction.
 
-*Note:* \\cite[Remark 5.2]{Halko2011} recommends input of `Q` of the form `Q=A*Ω`
+*Note:* Remark 5.2 of [^Halko2011] recommends input of `Q` of the form `Q=A*Ω`
 where `Ω` is a sample computed by `randn(n,l)` or even `srft(l)`.
 
 # Arguments
 
-`A::Hermitian`: input matrix. Must be `Hermitian` and support pre- and post-multiply.
-
-`Q`: matrix containing basis vectors of the subspace whose restriction to is
-desired. Need not be orthogonal or normalized.
+- `A::Hermitian`: input matrix. Must be `Hermitian` and support pre- and post-multiply.
+- `Q`: matrix containing basis vectors of the subspace whose restriction to is
+  desired. Need not be orthogonal or normalized.
 
 # Output
 
-`::Base.LinAlg.Eigen`: eigen factorization.
+- `::Base.LinAlg.Eigen`: eigen factorization.
 
 # See also
 
@@ -466,7 +416,7 @@ interpolative decomposition `idfact()`.
 
 # References
 
-Algorithm 5.4 of \\cite{Halko2011}
+Algorithm 5.4 of [^Halko2011]
 """
 function eigfact_re(A::Hermitian, Q)
     X, J = idfact(Q)
@@ -485,14 +435,13 @@ spanned by `Q` using the Nyström method.
 
 # Arguments
 
-`A`: input matrix. Must be positive semidefinite.
-
-`Q`: orthonormal matrix containing basis vectors of the subspace whose
-restriction to is desired.
+- `A`: input matrix. Must be positive semidefinite.
+- `Q`: orthonormal matrix containing basis vectors of the subspace whose
+  restriction to is desired.
 
 # Output
 
-`::Base.LinAlg.Eigen`: eigen factorization.
+- `::Base.LinAlg.Eigen`: eigen factorization.
 
 # See also
 
@@ -501,7 +450,7 @@ that can be Cholesky decomposed.
 
 # References
 
-Algorithm 5.5 of \\cite{Halko2011}
+Algorithm 5.5 of [^Halko2011]
 """
 function eigfact_nystrom(A, Q)
     B₁=A*Q
@@ -520,21 +469,18 @@ product involving `A`.
 
 # Arguments
 
-`A::Hermitian`: input matrix.
-
-`Ω`: sample matrix for the column space, e.g. `randn(n, l)` or `srft(l)`.
-
-`Ω̃;`: sample matrix for the row space. Not neeeded for `Hermitian` matrices.
-
-`At = A'`: computes transpose of input matrix.
+- `A::Hermitian`: input matrix.
+- `Ω`: sample matrix for the column space, e.g. `randn(n, l)` or `srft(l)`.
+- `Ω̃;`: sample matrix for the row space. Not neeeded for `Hermitian` matrices.
+- `At = A'`: computes transpose of input matrix.
 
 # Output
 
-`::Base.LingAlg.Eigen`: eigen factorization.
+- `::Base.LingAlg.Eigen`: eigen factorization.
 
 # References
 
-Algorithm 5.6 of \\cite{Halko2011}
+Algorithm 5.6 of [^Halko2011]
 """
 function eigfact_onepass(A::Hermitian, Ω)
     Y=A*Ω; Q = full(qrfact!(Y)[:Q])
@@ -562,34 +508,17 @@ algorithm.
 
 # Arguments
 
-`A`: input matrix.
-
-`l::Int`: number of eigenpairs to find.
+- `A`: input matrix.
+- `l::Int`: number of eigenpairs to find.
 
 # Output
 
-`::Base.LinAlg.Eigen`: eigen decomposition.
+- `::Base.LinAlg.Eigen`: eigen decomposition.
 
 # Implementation note
 
 This is a wrapper around `eigfact_onepass()` which uses the randomized
 samples found using `srft(l)`.
-
-# References
-
-```bibtex
-@article{Halko2011,
-    author = {Halko, N and Martinsson, P G and Tropp, J A},
-    doi = {10.1137/090771806},
-    journal = {SIAM Review},
-    month = jan,
-    number = {2},
-    pages = {217--288},
-    title = {Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions},
-    volume = {53},
-    year = {2011}
-}
-```
 """
 reig(A::Hermitian, l::Int) = eigfact_onepass(A, srft(l))
 reig(A, l::Int) = eigfact_onepass(A, srft(l), srft(l))
diff --git a/src/rsvd_fnkz.jl b/src/rsvd_fnkz.jl
index d843b693..2b6f9881 100644
--- a/src/rsvd_fnkz.jl
+++ b/src/rsvd_fnkz.jl
@@ -12,46 +12,32 @@ X ⊗ Y = OuterProduct{promote_type(eltype(X), eltype(Y))}(X, Y)
     rsvd_fnkz(A, k)
 
 Compute the randomized SVD by iterative refinement from randomly selected
-columns/rows.
+columns/rows [^Friedland2006].
 
 # Arguments
 
-`A`: matrix whose SVD is desired.
-
-`k::Int`: desired rank of approximation (`k ≤ min(m, n)`).
+- `A`: matrix whose SVD is desired;
+- `k::Int`: desired rank of approximation (`k ≤ min(m, n)`).
 
 ## Keywords
 
-`l::Int = k`: number of columns/rows to sample at each iteration (`1 ≤ l ≤ k`).
-
-`N::Int = minimum(size(A))`: maximum number of iterations.
-
-`ϵ::Real = prod(size(A))*eps()`: relative threshold for convergence, as
-measured by growth of the spectral norm.
-
-`method::Symbol = :eig`: problem to solve.
-* `:eig`: eigenproblem.
-* `:svd`: singular problem.
-
-`verbose::Bool = false`: print convergence information at each iteration.
+- `l::Int = k`: number of columns/rows to sample at each iteration (`1 ≤ l ≤ k`);
+- `N::Int = minimum(size(A))`: maximum number of iterations;
+- `ϵ::Real = prod(size(A))*eps()`: relative threshold for convergence, as
+  measured by growth of the spectral norm;
+- `method::Symbol = :eig`: problem to solve.
+  1. `:eig`: eigenproblem.
+  2. `:svd`: singular problem.
+- `verbose::Bool = false`: print convergence information at each iteration.
 
-# Output
+# Return value
 
 SVD object of `rank ≤ k`.
 
-# References
-
-```bibtex
-@inproceedings{,
-    author={Friedland, S. and Niknejad, A. and Kaveh, Mostafa and Zare, H.},
-    booktitle={System of Systems Engineering, 2006 IEEE/SMC International Conference on},
-    title={Fast Monte-Carlo low rank approximations for matrices},
-    year={2006},
-    month={April},
-    pages={218--223},
-    doi={10.1109/SYSOSE.2006.1652299}
-}
-```
+[^Friedland2006]: 
+    Friedland, Shmuel, et al. "Fast Monte-Carlo low rank approximations for 
+    matrices." System of Systems Engineering, 2006 IEEE/SMC International 
+    Conference on. IEEE, 2006.
 """
 function rsvd_fnkz(A, k::Int;
     l::Int=k, N::Int=minimum(size(A)), verbose::Bool=false,
diff --git a/src/simple.jl b/src/simple.jl
index fcfa2da6..65b359bb 100644
--- a/src/simple.jl
+++ b/src/simple.jl
@@ -1,7 +1,7 @@
 import Base: start, next, done
 
 #Simple methods
-export powm, invpowm
+export powm, powm!, invpowm, invpowm!
 
 mutable struct PowerMethodIterable{matT, vecT <: AbstractVector, numT <: Number, eigvalT <: Number}
     A::matT
@@ -60,20 +60,71 @@ function powm_iterable(A; kwargs...)
     powm_iterable!(A, x0; kwargs...)
 end
 
-####################
-# API method calls #
-####################
+"""
+    powm(B; kwargs...) -> λ, x, [history]
 
-function powm(A; kwargs...)
-    x0 = rand(Complex{real(eltype(A))}, size(A, 1))
-    scale!(x0, one(eltype(A)) / norm(x0))
-    powm!(A, x0; kwargs...)
+See [`powm!`](@ref). Calls `powm!(B, x0; kwargs...)` with 
+`x0` initialized as a random, complex unit vector.
+"""
+function powm(B; kwargs...)
+    x0 = rand(Complex{real(eltype(B))}, size(B, 1))
+    scale!(x0, one(eltype(B)) / norm(x0))
+    powm!(B, x0; kwargs...)
 end
 
-function powm!(A, x;
-    tol = eps(real(eltype(A))) * size(A, 2) ^ 3,
-    maxiter = size(A, 1),
-    shift = zero(eltype(A)),
+"""
+    powm!(B, x; shift = zero(eltype(B)), inverse::Bool = false, kwargs...) -> λ, x, [history]
+
+By default finds the approximate eigenpair `(λ, x)` of `B` where `|λ|` is largest.
+
+# Arguments
+- `B`: linear map, see the note below.
+- `x`: normalized initial guess. Don't forget to use complex arithmetic when necessary.
+
+## Keywords
+- `tol::Real = eps(real(eltype(B))) * size(B, 2) ^ 3`: stopping tolerance for the residual norm;
+- `maxiter::Integer = size(B,2)`: maximum number of iterations;
+- `log::Bool`: keep track of the residual norm in each iteration;
+- `verbose::Bool`: print convergence information during the iterations.
+
+!!! note "Shift-and-invert"
+    When applying shift-and-invert to ``Ax = λx`` with `invert = true` and `shift = ...`, note 
+    that the role of `B * b` becomes computing `inv(A - shift I) * b`. So rather than 
+    passing the linear map ``A`` itself, pass a linear map `B` that has the action of 
+    shift-and-invert. The eigenvalue is transformed back to an eigenvalue of the actual 
+    matrix ``A``.
+
+# Return values
+
+**if `log` is `false`**
+- `λ::Number` approximate eigenvalue computed as the Rayleigh quotient;
+- `x::Vector` approximate eigenvector.
+
+**if `log` is `true`**
+- `λ::Number`: approximate eigenvalue computed as the Rayleigh quotient;
+- `x::Vector`: approximate eigenvector;
+- `history`: convergence history.
+
+**ConvergenceHistory keys**
+
+- `:tol` => `::Real`: stopping tolerance;
+- `:resnom` => `::Vector`: residual norm at each iteration.
+
+# Examples
+
+```julia
+using LinearMaps
+σ = 1.0 + 1.3im
+A = rand(Complex128, 50, 50)
+F = lufact(A - σ * I)
+Fmap = LinearMap{Complex128}((y, x) -> A_ldiv_B!(y, F, x), 50, ismutating = true)
+λ, x = powm(Fmap, inverse = true, shift = σ, tol = 1e-4, maxiter = 200)
+```
+"""
+function powm!(B, x;
+    tol = eps(real(eltype(B))) * size(B, 2) ^ 3,
+    maxiter = size(B, 1),
+    shift = zero(eltype(B)),
     inverse::Bool = false,
     log::Bool = false,
     verbose::Bool = false
@@ -83,7 +134,7 @@ function powm!(A, x;
     reserve!(history, :resnorm, maxiter)
     verbose && @printf("=== powm ===\n%4s\t%7s\n", "iter", "resnorm")
 
-    iterable = powm_iterable!(A, x, tol = tol, maxiter = maxiter)
+    iterable = powm_iterable!(B, x, tol = tol, maxiter = maxiter)
 
     for (iteration, residual) = enumerate(iterable)
         nextiter!(history, mvps = 1)
@@ -102,30 +153,14 @@ function powm!(A, x;
     log ? (λ, x, history) : (λ, x)
 end
 
-function invpowm(B; kwargs...)
-    x0 = rand(Complex{real(eltype(B))}, size(B, 1))
-    scale!(x0, one(eltype(B)) / norm(x0))
-    invpowm!(B, x0; kwargs...)
-end
-
-invpowm!(B, x0; kwargs...) = powm!(B, x0; inverse = true, kwargs...)
+"""
+    invpowm(B; shift = σ, kwargs...) -> λ, x, [history]
 
-#################
-# Documentation #
-#################
+Find the approximate eigenpair `(λ, x)` of ``A`` near `shift`, where `B`
+is a linear map that has the effect `B * v = inv(A - σI) * v`.
 
-let
-#Initialize parameters
-doc1_call = """    powm(A)
-"""
-doc2_call = """    invpowm(B)
-"""
-doc1_msg = """Find the largest eigenvalue `λ` (in absolute value) of `A` and its associated eigenvector `x`
-using the power method.
-"""
-doc2_msg = """For an eigenvalue problem Ax = λx, find the closest eigenvalue in the complex plane to `shift`
-together with its associated eigenvector `x`. The first argument `B` should be a mapping that has
-the effect of B * x = inv(A - shift * I) * x and should support the `A_mul_B!` operation.
+The method calls `powm!(B, x0; inverse = true, shift = σ)` with
+`x0` a random, complex unit vector. See [`powm!`](@ref)
 
 # Examples
 
@@ -133,75 +168,23 @@ the effect of B * x = inv(A - shift * I) * x and should support the `A_mul_B!` o
 using LinearMaps
 σ = 1.0 + 1.3im
 A = rand(Complex128, 50, 50)
-F = lufact(A - UniformScaling(σ))
+F = lufact(A - σ * I)
 Fmap = LinearMap{Complex128}((y, x) -> A_ldiv_B!(y, F, x), 50, ismutating = true)
 λ, x = invpowm(Fmap, shift = σ, tol = 1e-4, maxiter = 200)
 ```
 """
-doc1_karg = ""
-doc2_karg = "`shift::Number=0`: shift to be applied to matrix A."
-
-doc1_version = (powm, doc1_call, doc1_msg, doc1_karg)
-doc2_version = (invpowm, doc2_call, doc2_msg, doc2_karg)
-
-i=0
-docstring = Vector(2)
-
-#Build docs
-for (func, call, msg, karg) in [doc1_version, doc2_version]
-i+=1
-docstring[i] = """
-$call
-
-$msg
-
-If `log` is set to `true` is given, method will output a tuple `λ, x, ch`. Where
-`ch` is a `ConvergenceHistory` object. Otherwise it will only return `λ, x`.
-
-# Arguments
-
-`A`: linear operator.
-
-## Keywords
-
-$karg
-
-`x = random unit vector`: initial eigenvector guess.
-
-`tol::Real = eps()*size(A,2)^3`: stopping tolerance.
-
-`maxiter::Integer = size(A,2)`: maximum number of iterations.
-
-`verbose::Bool = false`: verbose flag.
-
-`log::Bool = false`: output an extra element of type `ConvergenceHistory`
-containing extra information of the method execution.
-
-# Output
-
-**if `log` is `false`**
-
-`λ::Number`: eigenvalue
-
-`x::Vector`: eigenvector
-
-**if `log` is `true`**
-
-`eig::Real`: eigenvalue
-
-`x::Vector`: eigenvector
-
-`ch`: convergence history.
-
-**ConvergenceHistory keys**
+function invpowm(B; kwargs...)
+    x0 = rand(Complex{real(eltype(B))}, size(B, 1))
+    scale!(x0, one(eltype(B)) / norm(x0))
+    invpowm!(B, x0; kwargs...)
+end
 
-`:tol` => `::Real`: stopping tolerance.
+"""
+    invpowm!(B, x0; shift = σ, kwargs...) -> λ, x, [history]
 
-`:resnom` => `::Vector`: residual norm at each iteration.
+Find the approximate eigenpair `(λ, x)` of ``A`` near `shift`, where `B`
+is a linear map that has the effect `B * v = inv(A - σI) * v`.
 
+The method calls `powm!(B, x0; inverse = true, shift = σ)`. See [`powm!`](@ref).
 """
-end
-
-@doc docstring[1] -> powm
-@doc docstring[2] -> invpowm
-end
+invpowm!(B, x0; kwargs...) = powm!(B, x0; inverse = true, kwargs...)
diff --git a/src/stationary.jl b/src/stationary.jl
index e0e0cffc..684d8699 100644
--- a/src/stationary.jl
+++ b/src/stationary.jl
@@ -1,8 +1,7 @@
-#Stationary iterative methods
-#Templates, section 2.2
 export jacobi, jacobi!, gauss_seidel, gauss_seidel!, sor, sor!, ssor, ssor!
 
 import Base.LinAlg.SingularException
+import Base: start, next, done, getindex
 
 function check_diag(A::AbstractMatrix)
     for i = 1 : size(A, 1)
@@ -12,13 +11,24 @@ function check_diag(A::AbstractMatrix)
     end
 end
 
-####################
-# API method calls #
-####################
+"""
+    jacobi(A, b) -> x
 
+Same as [`jacobi!`](@ref), but allocates a solution vector `x` initialized with zeros.
+"""
 jacobi(A::AbstractMatrix, b; kwargs...) = jacobi!(zerox(A, b), A, b; kwargs...)
 
-function jacobi!(x, A::AbstractMatrix, b; maxiter::Int=size(A, 2))
+"""
+    jacobi!(x, A::AbstractMatrix, b; maxiter=10) -> x
+
+Performs exactly `maxiter` Jacobi iterations.
+
+Allocates a single temporary vector and traverses `A` columnwise.
+
+Throws `Base.LinAlg.SingularException` when the diagonal has a zero. This check
+is performed once beforehand.
+"""
+function jacobi!(x, A::AbstractMatrix, b; maxiter::Int=10)
     check_diag(A)
     iterable = DenseJacobiIterable(A, x, similar(x), b, maxiter)
     for _ = iterable end
@@ -59,13 +69,24 @@ function next(j::DenseJacobiIterable, iteration::Int)
     nothing, iteration + 1
 end
 
-####################
-# API method calls #
-####################
+"""
+    gauss_seidel(A, b) -> x
 
+Same as [`gauss_seidel!`](@ref), but allocates a solution vector `x` initialized with zeros.
+"""
 gauss_seidel(A::AbstractMatrix, b; kwargs...) = gauss_seidel!(zerox(A, b), A, b; kwargs...)
 
-function gauss_seidel!(x, A::AbstractMatrix, b; maxiter::Int=size(A,2))
+"""
+    gauss_seidel!(x, A::AbstractMatrix, b; maxiter=10) -> x
+
+Performs exactly `maxiter` Gauss-Seidel iterations.
+
+Works fully in-place and traverses `A` columnwise.
+
+Throws `Base.LinAlg.SingularException` when the diagonal has a zero. This check
+is performed once beforehand.
+"""
+function gauss_seidel!(x, A::AbstractMatrix, b; maxiter::Int=10)
     check_diag(A)
     iterable = DenseGaussSeidelIterable(A, x, b, maxiter)
     for _ = iterable end
@@ -103,14 +124,25 @@ function next(s::DenseGaussSeidelIterable, iteration::Int)
     nothing, iteration + 1
 end
 
-####################
-# API method calls #
-####################
+"""
+    sor(A, b, ω::Real) -> x
 
+Same as [`sor!`](@ref), but allocates a solution vector `x` initialized with zeros.
+"""
 sor(A::AbstractMatrix, b, ω::Real; kwargs...) =
     sor!(zerox(A, b), A, b, ω; kwargs...)
 
-function sor!(x, A::AbstractMatrix, b, ω::Real; maxiter=size(A, 2))
+"""
+    sor!(x, A::AbstractMatrix, b, ω::Real; maxiter=10) -> x
+
+Performs exactly `maxiter` SOR iterations with relaxation parameter `ω`.
+
+Allocates a single temporary vector and traverses `A` columnwise.
+
+Throws `Base.LinAlg.SingularException` when the diagonal has a zero. This check
+is performed once beforehand.
+"""
+function sor!(x, A::AbstractMatrix, b, ω::Real; maxiter::Int=10)
     check_diag(A)
     iterable = DenseSORIterable(A, x, similar(x), b, ω, maxiter)
     for _ = iterable end
@@ -149,14 +181,26 @@ function next(s::DenseSORIterable, iteration::Int)
     nothing, iteration + 1
 end
 
-####################
-# API method calls #
-####################
+"""
+    ssor(A, b, ω::Real) -> x
 
+Same as [`ssor!`](@ref), but allocates a solution vector `x` initialized with zeros.
+"""
 ssor(A::AbstractMatrix, b, ω::Real; kwargs...) =
     ssor!(zerox(A, b), A, b, ω; kwargs...)
 
-function ssor!(x, A::AbstractMatrix, b, ω::Real; maxiter::Int=size(A,2))
+"""
+    ssor!(x, A::AbstractMatrix, b, ω::Real; maxiter=10) -> x
+
+Performs exactly `maxiter` SSOR iterations with relaxation parameter `ω`. Each iteration 
+is basically a forward *and* backward sweep of SOR.
+
+Allocates a single temporary vector and traverses `A` columnwise.
+
+Throws `Base.LinAlg.SingularException` when the diagonal has a zero. This check
+is performed once beforehand.
+"""
+function ssor!(x, A::AbstractMatrix, b, ω::Real; maxiter::Int=10)
     check_diag(A)
     iterable = DenseSSORIterable(A, x, similar(x), b, ω, maxiter)
     for _ = iterable end
@@ -209,114 +253,3 @@ function next(s::DenseSSORIterable, iteration::Int)
 
     nothing, iteration + 1
 end
-
-#################
-# Documentation #
-#################
-
-let
-#Initialize parameters
-doc1_call = """    jacobi(A, b)
-"""
-doc1!_call = """    jacobi!(x, A, b)
-"""
-doc2_call = """    gauss_seidel(A, b)
-"""
-doc2!_call = """    gauss_seidel!(x, A, b)
-"""
-doc3_call = """    sor(A, b, ω)
-"""
-doc3!_call = """    sor!(x, A, b, ω)
-"""
-doc4_call = """    ssor(A, b, ω)
-"""
-doc4!_call = """    ssor!(x, A, b, ω)
-"""
-doc1_msg = "Solve A*x=b with the Jacobi method."
-doc2_msg = "Solve A*x=b with the Gauss-Seidel method."
-doc3_msg = "Solve A*x=b with the successive overrelaxation method."
-doc4_msg = "Solve A*x=b with the symmetric successive overrelaxation method."
-doc1!_msg = "Overwrite `x`.\n\n" * doc1_msg
-doc2!_msg = "Overwrite `x`.\n\n" * doc2_msg
-doc3!_msg = "Overwrite `x`.\n\n" * doc3_msg
-doc4!_msg = "Overwrite `x`.\n\n" * doc4_msg
-doc1_arg = ""
-doc2_arg = ""
-doc3_arg = "`shift::Number=0`: shift to be applied to matrix A."
-doc4_arg = "`shift::Number=0`: shift to be applied to matrix A."
-doc1!_arg = "`x`: initial guess, overwrite final estimation."
-doc2!_arg = "`x`: initial guess, overwrite final estimation."
-doc3!_arg = "`x`: initial guess, overwrite final estimation.\n\n$doc3_arg"
-doc4!_arg = "`x`: initial guess, overwrite final estimation.\n\n$doc4_arg"
-
-doc1_version = (jacobi, doc1_call, doc1_msg, doc1_arg)
-doc2_version = (gauss_seidel, doc2_call, doc2_msg, doc2_arg)
-doc3_version = (sor, doc3_call, doc3_msg, doc3_arg)
-doc4_version = (ssor, doc4_call, doc4_msg, doc4_arg)
-doc1!_version = (jacobi!, doc1!_call, doc1!_msg, doc1!_arg)
-doc2!_version = (gauss_seidel!, doc2!_call, doc2!_msg, doc2!_arg)
-doc3!_version = (sor!, doc3!_call, doc3!_msg, doc3!_arg)
-doc4!_version = (ssor!, doc4!_call, doc4!_msg, doc4!_arg)
-
-i=0
-docstring = Vector(8)
-
-#Build docs
-for (func, call, msg, arg) in [doc1_version, doc2_version, doc3_version, doc4_version,
-                                doc1!_version, doc2!_version, doc3!_version, doc4!_version]
-i+=1
-docstring[i] = """
-$call
-
-$msg
-
-`ch` is a `ConvergenceHistory` object. Otherwise it will only return `x`.
-If `log` is set to `true` is given, method will output a tuple `x, ch`.
-
-# Arguments
-
-$arg
-
-`A`: linear operator.
-
-## Keywords
-
-`tol::Real = size(A,2)^3*eps()`: stopping tolerance.
-
-`maxiter::Integer = size(A,2)^2`: maximum number of iterations.
-
-`verbose::Bool = false`: verbose flag.
-
-`log::Bool = false`: output an extra element of type `ConvergenceHistory`
-containing extra information of the method execution.
-
-# Output
-
-**if `log` is `false`**
-
-`x`: approximated solution.
-
-**if `log` is `true`**
-
-`x`: approximated solution.
-
-`ch`: convergence history.
-
-**ConvergenceHistory keys**
-
-`:tol` => `::Real`: stopping tolerance.
-
-`:resnom` => `::Vector`: residual norm at each iteration.
-
-"""
-end
-
-@doc docstring[1] -> jacobi
-@doc docstring[5] -> jacobi!
-@doc docstring[2] -> gauss_seidel
-@doc docstring[6] -> gauss_seidel!
-@doc docstring[3] -> sor
-@doc docstring[7] -> sor!
-@doc docstring[4] -> ssor
-@doc docstring[8] -> ssor!
-end
diff --git a/src/stationary_sparse.jl b/src/stationary_sparse.jl
index cafecbaa..19c1bc25 100644
--- a/src/stationary_sparse.jl
+++ b/src/stationary_sparse.jl
@@ -232,18 +232,18 @@ end
 jacobi_iterable(x::AbstractVector, A::SparseMatrixCSC, b::AbstractVector; maxiter::Int = 10) =
     JacobiIterable{eltype(x), typeof(x)}(OffDiagonal(A, DiagonalIndices(A)), x, similar(x), b, maxiter)
 
+
 """
-Jacobi iteration is a matrix-splitting iteration with
-`A = D + R` where `D` is the diagonal and `R` the off-diagonal.
-It iterates as x ← D \\ (b - R * x).
-
-The SparseMatrixCSC version of Jacobi iteration pre-computes the
-indices of the diagonal of `A` and stores a temporary vector.
-Storage costs are hence O(2n). For multiple solves with `A` it is
-recommended to reuse the underlying iterator, see the 
-`JacobiIterable` type.
+    jacobi!(x, A::SparseMatrixCSC, b; maxiter=10) -> x
+
+Performs exactly `maxiter` Jacobi iterations.
+
+Allocates a temporary vector and precomputes the diagonal indices.
+
+Throws `Base.LinAlg.SingularException` when the diagonal has a zero. This check
+is performed once beforehand.
 """
-function jacobi!(x::AbstractVector, A::SparseMatrixCSC, b::AbstractVector; maxiter::Int = 10)
+function jacobi!(x::AbstractVector, A::SparseMatrixCSC, b::AbstractVector; maxiter::Int=10)
     iterable = jacobi_iterable(x, A, b, maxiter = maxiter)
     for item = iterable end
     iterable.x
@@ -280,15 +280,14 @@ function next(g::GaussSeidelIterable, iteration::Int)
 end
 
 """
-Gauss-Seidel iteration is a matrix-splitting iteration with
-`A = L + U` where `L` is lower-triangular and `U` strictly upper-triangular.
-It iterates as x ← L \\ (b - U * x).
-
-Gauss-Seidel can be computed in-place even for the `SparseMatrixCSC` type,
-however, this implementation pre-computes the indices of the diagonal of `A`.
-Hence storage costs are O(n). For multiple solves with `A` it is
-recommended to reuse the underlying iterator, see the 
-`GaussSeidelIterable` type.
+    gauss_seidel!(x, A::SparseMatrixCSC, b; maxiter=10) -> x
+
+Performs exactly `maxiter` Gauss-Seidel iterations.
+
+Works fully in-place, but precomputes the diagonal indices.
+
+Throws `Base.LinAlg.SingularException` when the diagonal has a zero. This check
+is performed once beforehand.
 """
 function gauss_seidel!(x::AbstractVector, A::SparseMatrixCSC, b::AbstractVector; maxiter::Int = 10)
     iterable = gauss_seidel_iterable(x, A, b, maxiter = maxiter)
@@ -334,14 +333,14 @@ function sor_iterable(x::AbstractVector, A::SparseMatrixCSC, b::AbstractVector,
 end
 
 """
-SOR takes a weighted average of the previous iterate and the Gauss-Seidel iterate
-x_i ← ω * x_gs + (1 - ω) * x_i for each successive component i = 1, ..., n.
-
-Although SOR can be computed fully in-place, this is difficult to accomplish for
-the `SparseMatrixCSC` type. A temporary vector is pre-allocated together with the 
-indices of the diagonal of `A`. Hence storage costs are O(2n). For multiple solves 
-with `A` it is recommended to reuse the underlying iterator, see the 
-`SORIterable` type.
+    sor!(x, A::SparseMatrixCSC, b, ω::Real; maxiter=10)
+
+Performs exactly `maxiter` SOR iterations with relaxation parameter `ω`.
+
+Allocates a temporary vector and precomputes the diagonal indices.
+
+Throws `Base.LinAlg.SingularException` when the diagonal has a zero. This check
+is performed once beforehand.
 """
 function sor!(x::AbstractVector, A::SparseMatrixCSC, b::AbstractVector, ω::Real; maxiter::Int = 10)
     iterable = sor_iterable(x, A, b, ω, maxiter = maxiter)
@@ -394,14 +393,15 @@ function next(s::SSORIterable{T}, iteration::Int) where {T}
 end
 
 """
-Symmetric SOR (SSOR) combines two SOR sweeps (one forward and one backward) such that
-the resulting iteration matrix is symmetric.
-
-Although SSOR can be computed fully in-place, this is difficult to accomplish for
-the `SparseMatrixCSC` type. A temporary vector is pre-allocated together with the 
-indices of the diagonal of `A`. Hence storage costs are O(2n). For multiple solves 
-with `A` it is recommended to reuse the underlying iterator, see the 
-`SSORIterable` type.
+    ssor!(x, A::SparseMatrixCSC, b, ω::Real; maxiter=10)
+
+Performs exactly `maxiter` SSOR iterations with relaxation parameter `ω`. Each iteration 
+is basically a forward *and* backward sweep of SOR.
+
+Allocates a temporary vector and precomputes the diagonal indices.
+
+Throws `Base.LinAlg.SingularException` when the diagonal has a zero. This check
+is performed once beforehand.
 """
 function ssor!(x::AbstractVector, A::SparseMatrixCSC, b::AbstractVector, ω::Real; maxiter::Int = 10)
     iterable = ssor_iterable(x, A, b, ω, maxiter = maxiter)
diff --git a/src/svdl.jl b/src/svdl.jl
index 7b478369..f2112161 100644
--- a/src/svdl.jl
+++ b/src/svdl.jl
@@ -83,85 +83,74 @@ end
 ####################
 
 """
-    svdl(A)
+    svdl(A) -> Σ, L, [history]
 
 Compute some singular values (and optionally vectors) using Golub-Kahan-Lanczos
-bidiagonalization \\cite{Golub1965} with thick restarting \\cite{Wu2000}.
+bidiagonalization [^Golub1965] with thick restarting [^Wu2000].
 
 If `log` is set to `true` is given, method will output a tuple `X, L, ch`. Where
 `ch` is a `ConvergenceHistory` object. Otherwise it will only return `X, L`.
 
 # Arguments
 
-`A` : The matrix or matrix-like object whose singular values are desired.
+- `A` : The matrix or matrix-like object whose singular values are desired.
 
 ## Keywords
 
-`nsv::Int = 6`: number of singular values requested.
-
-`v0 = random unit vector`: starting guess vector in the domain of `A`.
-The length of `q` should be the number of columns in `A`.
-
-`k::Int = 2nsv`: maximum number of Lanczos vectors to compute before restarting.
-
-`j::Int = nsv`: number of vectors to keep at the end of the restart.
-We don't recommend j < nsv.
-
-`maxiter::Int = minimum(size(A))`: maximum number of iterations to run.
-
-`verbose::Bool = false`: print information at each iteration.
-
-`tol::Real = √eps()`: maximum absolute error in each desired singular value.
-
-`reltol::Real=√eps()`: maximum error in each desired singular value relative to the
-estimated norm of the input matrix.
-
-`method::Symbol=:ritz`: restarting algorithm to use. Valid choices are:
-
-* `:ritz`: Thick restart with Ritz values [Wu2000].
-* `:harmonic`: Restart with harmonic Ritz values [Baglama2005].
-
-`vecs::Symbol = :none`: singular vectors to return.
-
-* `:both`: Both left and right singular vectors are returned.
-* `:left`: Only the left singular vectors are returned.
-* `:right`: Only the right singular vectors are returned.
-* `:none`: No singular vectors are returned.
-
-`dolock::Bool=false`: If `true`, locks converged Ritz values, removing them
-from the Krylov subspace being searched in the next macroiteration.
-
-`log::Bool = false`: output an extra element of type `ConvergenceHistory`
-containing extra information of the method execution.
-
-# Output
+- `nsv::Int = 6`: number of singular values requested;
+- `v0 = random unit vector`: starting guess vector in the domain of `A`.
+  The length of `q` should be the number of columns in `A`;
+- `k::Int = 2nsv`: maximum number of Lanczos vectors to compute before restarting;
+- `j::Int = nsv`: number of vectors to keep at the end of the restart. 
+  We don't recommend j < nsv;
+- `maxiter::Int = minimum(size(A))`: maximum number of iterations to run;
+- `verbose::Bool = false`: print information at each iteration;
+- `tol::Real = √eps()`: maximum absolute error in each desired singular value;
+- `reltol::Real=√eps()`: maximum error in each desired singular value relative to the 
+  estimated norm of the input matrix;
+- `method::Symbol=:ritz`: restarting algorithm to use. Valid choices are:
+  1. `:ritz`: Thick restart with Ritz values [Wu2000].
+  2. `:harmonic`: Restart with harmonic Ritz values [Baglama2005].
+- `vecs::Symbol = :none`: singular vectors to return.
+  1. `:both`: Both left and right singular vectors are returned.
+  2. `:left`: Only the left singular vectors are returned.
+  3. `:right`: Only the right singular vectors are returned.
+  4. `:none`: No singular vectors are returned.
+- `dolock::Bool=false`: If `true`, locks converged Ritz values, removing them
+  from the Krylov subspace being searched in the next macroiteration;
+- `log::Bool = false`: output an extra element of type `ConvergenceHistory`
+  containing extra information of the method execution.
+
+# Return values
 
 **if `log` is `false`**
 
-`Σ`: list of the desired singular values if `vecs == :none` (the default),
-otherwise returns an `SVD` object with the desired singular vectors filled in.
-
-`L`: computed partial factorizations of A.
+- `Σ`: list of the desired singular values if `vecs == :none` (the default), otherwise 
+  returns an `SVD` object with the desired singular vectors filled in;
+- `L`: computed partial factorizations of A.
 
 **if `log` is `true`**
 
-`Σ`: list of the desired singular values if `vecs == :none` (the default),
-otherwise returns an `SVD` object with the desired singular vectors filled in.
-
-`L`: computed partial factorizations of A.
-
-`ch::ConvergenceHistory`: convergence history.
+- `Σ`: list of the desired singular values if `vecs == :none` (the default),
+otherwise returns an `SVD` object with the desired singular vectors filled in;
+- `L`: computed partial factorizations of A;
+- `history`: convergence history.
 
 **ConvergenceHistory keys**
 
-`:betas` => `betas`: The history of the computed betas.
-
-`:Bs` => `Bs`: The history of the computed projected matrices.
-
-`:ritz` => `ritzvalhist`: Ritz values computed at each iteration.
-
-`:conv` => `convhist`: Convergence data.
-
+- `:betas` => `betas`: The history of the computed betas.
+- `:Bs` => `Bs`: The history of the computed projected matrices.
+- `:ritz` => `ritzvalhist`: Ritz values computed at each iteration.
+- `:conv` => `convhist`: Convergence data.
+
+[^Golub1965]:
+    Golub, Gene, and William Kahan. "Calculating the singular values and pseudo-inverse 
+    of a matrix." Journal of the Society for Industrial and Applied Mathematics, 
+    Series B: Numerical Analysis 2.2 (1965): 205-224.
+[^Wu2000]:
+    Wu, Kesheng, and Horst Simon. "Thick-restart Lanczos method for large symmetric 
+    eigenvalue problems." SIAM Journal on Matrix Analysis and Applications 22.2 
+    (2000): 602-616.
 """
 function svdl(A;
     nsv::Int=6, k::Int=2nsv, tol::Real=√eps(),
@@ -262,85 +251,39 @@ Determine if any singular values in a partial factorization have converged.
 
 # Arguments
 
-`L::PartialFactorization` : a `PartialFactorization` computed by an iterative
-method such as `svdl`.
-
-`F::Base.LinAlg.SVD`: a `SVD` factorization computed for `L.B`.
-
-`k::Int` : number of singular values to check.
-
-`tol::Real`: absolute tolerance for a Ritz value to be considered converged.
-
-`reltol::Real`: relative tolerance for a Ritz value to be considered converged.
-
-`verbose::Bool = false`: if `true`, prints out all the results of convergence tests.
+- `L::PartialFactorization`: a `PartialFactorization` computed by an iterative
+method such as `svdl`;
+- `F::Base.LinAlg.SVD`: a `SVD` factorization computed for `L.B`;
+- `k::Int` : number of singular values to check;
+- `tol::Real`: absolute tolerance for a Ritz value to be considered converged;
+- `reltol::Real`: relative tolerance for a Ritz value to be considered converged;
+- `verbose::Bool = false`: if `true`, prints out all the results of convergence tests.
 
 # Implementation note
 
 This convergence test routine uses a variety of different tests.
 1. The crudest estimate of the error bound is a simple error bound which dates
-back at least to Wilkinson's classic book [Wilkinson1965:Ch.3 §53 p.170]
-
-```bibtex
-@book{Wilkinson1965,
-    address = {Oxford, UK},
-    author = {J H Wilkinson},
-    publisher = {Oxford},
-    title = {The Algebraic Eigenvalue Problem},
-    year = 1965
-}
-```
-
+back at least to Wilkinson's classic book [^Wilkinson1965] Ch.3 §53 p.170.
 2. When the Ritz values become sufficiently well-separated, more refined estimates
 can be derived from the Rayleigh-Ritz properties of the Krylov process, as
-described in [Wilkinson1965:Ch.3 §54-55 p.173, Yamamoto1980, Ortega1990]
-
-```bibtex
-@book{Ortega1990,
-    address = {Philadelphia, PA},
-    author = {Ortega, James M},
-    doi = {10.1137/1.9781611971323},
-    edition = {2},
-    publisher = {SIAM},
-    series = {Classics in Applied Mathematics},
-    title = {Numerical Analysis: A Second Course},
-    url = {http://epubs.siam.org/doi/book/10.1137/1.9781611971323},
-    year = 1990
-}
-@article{Yamamoto1980,
-    author = {Yamamoto, Tetsuro},
-    doi = {10.1007/BF01396059},
-    journal = {Numerische Mathematik},
-    number = 2,
-    pages = {189--199},
-    title = {Error bounds for computed eigenvalues and eigenvectors},
-    volume = {34},
-    year = 1980
-}
-@article{Geurts1982,
-    author = {Geurts, A J},
-    doi = {10.1007/BF01399313},
-    journal = {Numerische Mathematik},
-    month = feb,
-    number = {1},
-    pages = {85--96},
-    title = {A contribution to the theory of condition},
-    volume = {39},
-    year = {1982}
-}
-@article{Deif1989,
-    author = {Deif, A.},
-    doi = {10.1007/BF01396348},
-    journal = {Numerische Mathematik},
-    month = jun,
-    number = {6},
-    pages = {625--626},
-    title = {A relative backward perturbation theorem for the eigenvalue problem},
-    volume = {56},
-    year = {1989}
-}
-```
-
+described in [^Wilkinson1965] Ch.3 §54-55 p.173, [^Yamamoto1980], [^Ortega1990],
+[^Geurts1982] and [^Deif1989].
+
+[^Wilkinson1965]:
+    Wilkinson, James Hardy. The algebraic eigenvalue problem. 
+    Vol. 87. Oxford: Clarendon Press, 1965.
+[^Yamamoto1980]: 
+    Yamamoto, Tetsuro. "Error bounds for computed eigenvalues 
+    and eigenvectors." Numerische Mathematik 34.2 (1980): 189-199.
+[^Ortega1990]:
+    Ortega, James M. Numerical analysis: a second course. 
+    Society for Industrial and Applied Mathematics, 1990.
+[^Geurts1982]:
+    Geurts, A. J. "A contribution to the theory of condition." 
+    Numerische Mathematik 39.1 (1982): 85-96.
+[^Deif1989]:
+    Deif, A. "A relative backward perturbation theorem for the eigenvalue 
+    problem." Numerische Mathematik 56.6 (1989): 625-626.
 """
 function isconverged(L::PartialFactorization, F::Base.LinAlg.SVD, k::Int, tol::Real,
     reltol::Real, log::ConvergenceHistory, verbose::Bool=false
@@ -426,7 +369,7 @@ Thick restart (with ordinary Ritz values)
 
 # References
 
-[Hernandez2008]
+[^Hernandez2008]
 
 """
 #Hernandez2008
@@ -462,14 +405,20 @@ function thickrestart!(A, L::PartialFactorization{T,Tr},
 end
 
 """
-    harmonicrestart!(A, L, F, k)
+    harmonicrestart!(A, L, F, k) -> L
 
-Thick restart with harmonic Ritz values.
+Thick restart with harmonic Ritz values. See [^Baglama2005] but note that 
+they have P and Q swapped relative to our notation, which follows that 
+of [^Hernandez2008]
 
-# References
-
-[Baglama2005] - note that they have P and Q swapped relative to our notation,
-which follows that of [Hernandez2008]
+[^Baglama2005]: 
+    Baglama, James, and Lothar Reichel. "Augmented implicitly restarted 
+    Lanczos bidiagonalization methods." SIAM Journal on Scientific 
+    Computing 27.1 (2005): 19-42.
+[^Hernandez2008]: 
+    Vicente Hernández, José E. Román, and Andrés Tomás. "A robust and 
+    efficient parallel SVD solver based on restarted Lanczos bidiagonalization." 
+    Electronic Transactions on Numerical Analysis 31 (2008): 68-85.
 
 """
 function harmonicrestart!(A, L::PartialFactorization{T,Tr},
@@ -550,63 +499,42 @@ of Lanczos vectors.
 
 # Arguments
 
-`A`: matrix or linear map generating the Lanczos vectors
-
-`L::PartialFactorization`: partial factorization.
+- `A`: matrix or linear map generating the Lanczos vectors;
+- `L::PartialFactorization`: partial factorization;
+- `orthleft::Bool = false`: orthogonalize left Lanczos vectors;
+- `orthright::Bool = true`: orthogonalize right Lanczos vectors;
+- `α::Real = 1/√2`: criterion for doing a second reorthogonalization.
 
-`orthleft::Bool = false`: orthogonalize left Lanczos vectors.
-
-`orthright::Bool = true`: orthogonalize right Lanczos vectors.
-
-`α::Real = 1/√2`: criterion for doing a second reorthogonalization.
-
-# Implementation note
+## Implementation details
 
-The implementation mostly follows the description in [Simon2000,Hernandez2008].
+The implementation mostly follows the description in [^Simon2000] and [^Hernandez2008].
 
 The reorthogonalization method used is using double classical Gram-Schmidt
 full reorthogonalization. As explained in the numerical analysis literature by
 Kahan, Golub, Rutishauser, and others in the 1970s, double classical
 Gram-Schmidt reorthogonalization always suffices to keep vectors orthogonal to
-within machine precision. As described in [Bjorck2015], α is a threshold
+within machine precision. As described in [^Bjorck2015], `α` is a threshold
 for determinining when the second orthogonalization is necessary. -log10(α) is
 the number of (decimal) digits lost due to cancellation. Common choices are
-α=0.1 [Rutishauser] and α=1/√2 [Daniel1976] (our default).
+`α=0.1` [Rutishauser] and `α=1/√2` [Daniel1976] (our default).
 
 In most situations it suffices to orthogonalize either the left vectors or the
 right vectors, except when the matrix norm exceeds `1/√eps(eltype(A))`, in
 which case it will be necessary to orthogonalize both sets of vectors. See
-[Simon2000].
-
-```bibtex
-@book{Bjorck2015,
-    author = {Bj{\\"{o}}rck, {\\AA}ke},
-    doi = {10.1007/978-3-319-05089-8},
-    publisher = {Springer},
-    series = {Texts in Applied Mathematics},
-    title = {Numerical Methods in Matrix Computations},
-    year = {2015}
-}
-@article{Simon2000,
-    author = {Simon, Horst D. and Zha, Hongyuan},
-    doi = {10.1137/S1064827597327309},
-    journal = {SIAM Journal on Scientific Computing},
-    number = {6},
-    pages = {2257--2274},
-    title = {Low-Rank Matrix Approximation Using the {Lanczos} Bidiagonalization Process with Applications},
-    volume = {21},
-    year = {2000}
-}
-@article{Daniel1976,
-    author = {Daniel, J. W. and Gragg, W. B. and Kaufman, L. and Stewart, G. W.},
-    doi = {10.1090/S0025-5718-1976-0431641-8},
-    journal = {Mathematics of Computation},
-    number = {136},
-    pages = {772--795},
-    title = {Reorthogonalization and stable algorithms for updating the {Gram-Schmidt QR} factorization},
-    volume = {30},
-    year = {1976}
-}
+[^Simon2000].
+
+[^Bjorck2015]: 
+    Björck, Åke. Numerical methods in matrix computations. 
+    New York: Springer, 2015.
+[^Simon2000]: 
+    Simon, Horst D., and Hongyuan Zha. "Low-rank matrix approximation 
+    using the Lanczos bidiagonalization process with applications." 
+    SIAM Journal on Scientific Computing 21.6 (2000): 2257-2274. 
+
+[^Daniel1976]:
+    Daniel, James W., et al. "Reorthogonalization and stable algorithms 
+    for updating the Gram-Schmidt QR factorization." Mathematics of 
+    Computation 30.136 (1976): 772-795.
 ```
 """
 function extend!(