diff --git a/docs/src/prob_and_solve.md b/docs/src/prob_and_solve.md
index 2b727e9bf..3fec93e7c 100644
--- a/docs/src/prob_and_solve.md
+++ b/docs/src/prob_and_solve.md
@@ -1,8 +1,10 @@
-# Problem Definition And Solution
+# Problem Definition and Solution
 
-As can be seen from the [introduction examples](@id Quickstart), [DataDrivenDiffEq.jl](https://github.com/SciML/DataDrivenDiffEq.jl) tries to structurize the workflow in a similar fashion to other [SciML](https://sciml.ai/) packages by defining a [`DataDrivenProblem`](@ref), dispatching on the `solve` command to return a [`DataDrivenSolution`](@ref).
+As can be seen from the [introduction examples](@id Quickstart), [DataDrivenDiffEq.jl](https://github.com/SciML/DataDrivenDiffEq.jl) tries to structure the workflow in a similar fashion to other [SciML](https://sciml.ai/) packages by defining a [`DataDrivenProblem`](@ref), dispatching on the `solve` command to return a [`DataDrivenSolution`](@ref).
 
-A problem in the sense of identification, estimation or inference is defined by the data describing it. This data contains at least measurements of the states `X`, which would be sufficient to describe a `DiscreteDataDrivenProblem` with unit time steps similar to the [first example on dynamic mode decomposition](@ref Linear-Systems-via-Dynamic-Mode-Decomposition). Of course we can extend this to include time points `t`, control signals `U` or a function describing those `u(x,p,t)`. Additionally, any parameters `p` known a priori can be included in the problem. In practice, this looks like
+## Defining a Problem
+
+Problems of identification, estimation or inference are defined by the data. This data contains at least measurements of the states `X`, which would be sufficient to describe a `DiscreteDataDrivenProblem` with unit time steps similar to the [first example on dynamic mode decomposition](@ref Linear-Systems-via-Dynamic-Mode-Decomposition). Of course we can extend this to include time points `t`, control signals `U` or a function describing those `u(x,p,t)`. Additionally, any parameters `p` known a priori can be included in the problem. In practice, this looks like:
 
 ```julia
 problem = DiscreteDataDrivenProblem(X)
@@ -12,7 +14,7 @@ problem = DiscreteDataDrivenProblem(X, t, U, p = p)
 problem = DiscreteDataDrivenProblem(X, t, (x,p,t)->u(x,p,t))
 ```
 
-Similarly, a `ContinuousDataDrivenProblem` would need at least measurements and time-derivatives (`X` and `DX`) or measurements, time information and a way to derive the time derivatives(`X`, `t` and a [Collocation](@ref) method). Again, this can be extended by including a control input as measurements or a function and possible parameters.
+Similarly, a `ContinuousDataDrivenProblem` would need at least measurements and time-derivatives (`X` and `DX`) or measurements, time information and a way to derive the time derivatives(`X`, `t` and a [Collocation](@ref) method). Again, this can be extended by including a control input as measurements or a function and possible parameters:
 
 ```julia
 problem = ContinuousDataDrivenProblem(X, DX)
@@ -45,7 +47,9 @@ problem = DirectDataDrivenProblem(X, t, Y, p = p)
 problem = DirectDataDrivenProblem(X, t, Y, (x,p,t)->u(x,p,t), p = p)
 ```
 
-Next up, we choose a method to `solve` the [`DataDrivenProblem`](@ref). Depending on the input arguments and the type of problem, the function will return a result derived via [`Koopman`](@ref) or [`Sparse Optimization`](@ref) methods. Different options can be provided as well as a [`Basis`](@ref) used for lifting the measurements, to control different options like rounding, normalization or the progressbar depending on the inference method. Possible options are provided [below](@ref optional_arguments).
+## Solving a Problem
+
+Next up, we choose a method to `solve` the [`DataDrivenProblem`](@ref). Depending on the input arguments and the type of problem, the function will return a result derived via [`Koopman`](@ref) or [`Sparse Optimization`](@ref) methods. Different options can be provided as well as a [`Basis`](@ref) used for lifting the measurements, to control different options like rounding, normalization or the progressbar depending on the inference method. Possible options are provided [below](@ref optional_arguments):
 
 ```julia
 # Use a Koopman based inference
@@ -54,7 +58,7 @@ res = solve(problem, DMDSVD(), kwargs...)
 res = solve(problem, basis, STLQS(), kwargs...)
 ```
 
-The [`DataDrivenSolution`](@ref) `res` contains a `result` which is the infered system and a [`Basis`](@ref), `metrics` which is a `NamedTuple` containing different metrics like the L2 error of the infered system with the provided data and the [`AICC`](@ref). These can be accessed via
+The [`DataDrivenSolution`](@ref) `res` contains a `result` which is the infered system and a [`Basis`](@ref), `metrics` which is a `NamedTuple` containing different metrics like the L2 error of the infered system with the provided data and the [`AICC`](@ref). These can be accessed via:
 
 ```julia
 # The infered system
@@ -97,7 +101,7 @@ res = solve(problem, basis, DMDSVD(), kwargs...)
 
 If control signals are present, they get processed according to [this paper](https://epubs.siam.org/doi/abs/10.1137/15M1013857?mobileUi=0) for dynamic mode decomposition and [as described here](https://epubs.siam.org/doi/pdf/10.1137/16M1062296) for extended dynamic mode decomposition assuming a linear relationship on the operator.
 
-Possible keyworded arguments include
+Possible keyworded arguments include:
 + `B` a linear mapping known a priori which maps the control signals onto the lifted states
 + `digits` controls the digits / rounding used for deriving the system equations (`digits = 1` would round `10.02` to `10.0`)
 + `operator_only` returns a `NamedTuple` containing the operator, input and output mapping and matrices used for updating the operator as described [here](https://arxiv.org/pdf/1406.7187.pdf)
@@ -105,7 +109,7 @@ Possible keyworded arguments include
 !!! info
     If `eval_expression` is set to `true`, the returning result of the Koopman based inference will not contain a parametrized equation, but rather use the numeric values of the operator/generator.
 
-SINDy based algorithms can be called like :
+SINDy based algorithms can be called like:
 
 ```julia
 res = solve(problem, basis, STLQS(), kwargs...)
@@ -121,7 +125,7 @@ equations ``f(u_t, u, p, t) = 0`` can be passed in
 res = solve(problem, basis, ImplicitOptimizer(), implicits, kwargs...)
 ```
 
-This indicates that we do not have coupling between two implicitly defined variables if not explicitly given. To elaborate this complicated sentence, consider the following:
+This indicates that we do not have coupling between two implicitly defined variables unless coupling is explicitly included. To elaborate, consider the following:
 
 ```julia
 basis = Basis([x, y, z], [x,y,z])
@@ -143,7 +147,7 @@ basis = Basis([x, y, z, x*y], [x,y,z])
 res = solve(problem, basis, ImplicitOptimizer(), [x,y])
 ```
 
-Would allow solutions of the form `x = y*x + z` or `y = y*x + z` to occur while suppressing `x = y + z`. This is due to the reason that `y*x` includes both `x` and `y`, so the function will get included in the evaluation.
+Would allow solutions of the form `x = y*x + z` or `y = y*x + z` to occur while suppressing `x = y + z`. This is because `y*x` includes both `x` and `y`, so the function will get included in the evaluation.
 
 Possible keyworded arguments include
 + `normalize` normalizes the data matrix ``\\Theta`` such that each row ( correspondng to candidate functions) has a 2-norm of `1.0`
diff --git a/docs/src/quickstart.md b/docs/src/quickstart.md
index 21122af8a..f4854fd0b 100644
--- a/docs/src/quickstart.md
+++ b/docs/src/quickstart.md
@@ -37,7 +37,7 @@ system = result(res)
 println(system) # hide
 ```
 
-The [`DataDrivenSolution`](@ref) contains an explicit result which is a [`Koopman`](@ref), defining all necessary information, e.g. the associated operator (which corresponds to our abefore defined matrix ``A``).
+The [`DataDrivenSolution`](@ref) contains an explicit result which is a [`Koopman`](@ref), defining all necessary information, e.g. the associated operator (which corresponds to our matrix ``A``).
 
 ```@example 4
 Matrix(system)
@@ -48,11 +48,11 @@ In general, we can skip the expensive progress of deriving a callable symbolic s
 res = solve(prob, DMDSVD(), digits = 1, operator_only = true)
 ```
 
-Where `K` is the associated operator given as its eigendecomposition, `B` is a possible mapping of inputs onto the states, `C` is the linear mapping from the lifted observeables back onto the original states and `Q` and `P` are used for updating the operator.
+Where `K` is the associated operator given as its eigendecomposition, `B` is a possible mapping of inputs onto the states, `C` is the linear mapping from the lifted observables back onto the original states and `Q` and `P` are used for updating the operator.
 
 ## Nonlinear System with Extended Dynamic Mode Decomposition
 
-Similarly, we can use the [Extended Dynamic Mode Decomposition](https://link.springer.com/article/10.1007/s00332-015-9258-5) via a nonlinear [`Basis`](@ref) of observeables. Here, we look a rather [famous example](https://arxiv.org/pdf/1510.03007.pdf) with a finite dimensional solution.
+Similarly, we can use the [Extended Dynamic Mode Decomposition](https://link.springer.com/article/10.1007/s00332-015-9258-5) via a nonlinear [`Basis`](@ref) of observeables. Here, we will look at a rather [famous example](https://arxiv.org/pdf/1510.03007.pdf) with a finite dimensional solution.
 
 ```@example 3
 using DataDrivenDiffEq
@@ -77,7 +77,7 @@ savefig("EDMD_Example_1.png") # hide
 ```
 ![](EDMD_Example_1.png)
 
-Since we are dealing with an continuous system in time, we define the associated [`DataDrivenProblem`](@ref) accordingly using the measured states `X`, their derivates `DX` and the time `t`.
+Since we are dealing with an continuous system in time, we define the associated [`DataDrivenProblem`](@ref) accordingly using the measured states `X`, their derivatives `DX` and the time `t`.
 
 ```@example 3
 prob = ContinuousDataDrivenProblem(solution)
@@ -104,7 +104,7 @@ operator(system)
 
 ## Nonlinear Systems - Sparse Identification of Nonlinear Dynamics
 
-To find the underlying system without any [`Algortihms`](@ref koopman_algorithms) related to Koopman operator theory, we can use  [Sparse Identification of Nonlinear Dynamics](https://www.pnas.org/content/113/15/3932) - SINDy for short. As the name suggests, it finds the sparsest basis of functions which build the observed trajectory. Again, we will start with a nonlinear system
+To find the underlying system without any [`Algorithms`](@ref koopman_algorithms) related to Koopman operator theory, we can use  [Sparse Identification of Nonlinear Dynamics](https://www.pnas.org/content/113/15/3932) - SINDy for short. As the name suggests, it finds the sparsest basis of functions which build the observed trajectory. Again, we will start with a nonlinear system:
 
 ```@example 1
 using DataDrivenDiffEq
@@ -115,7 +115,7 @@ using Plots
 using Random
 using Symbolics: scalarize
 
-Random.seed!(1111) # Due to the noise
+Random.seed!(1111) # For the noise
 
 # Create a nonlinear pendulum
 function pendulum(u, p, t)
@@ -183,7 +183,7 @@ println(res) # hide
 !!! info
     A more detailed description of the result can be printed via `print(res, Val{true})`, which also includes the discovered equations and parameter values.
 
-Where the resulting [`DataDrivenSolution`](@ref) stores information about the infered model and the parameters:
+Where the resulting [`DataDrivenSolution`](@ref) stores information about the inferred model and the parameters:
 
 ```@example 1
 system = result(res);
@@ -290,7 +290,7 @@ println(system)
 
 The following is another example on how to use the [`ImplicitOptimizer`](@ref) and is taken from the [original paper](https://royalsocietypublishing.org/doi/10.1098/rspa.2020.0279).
 
-As always, we start by creating a corresponding dataset.
+As always, we start by creating a corresponding dataset:
 
 ```@example 5
 using DataDrivenDiffEq
@@ -334,7 +334,7 @@ savefig("SINDy_Example_Data_3.png") # hide
 ```
 ![](SINDy_Example_Data_3.png)
 
-Next, we define a sufficient [`Basis`](@ref)
+Next, we define a sufficient [`Basis`](@ref):
 
 ```@example 5
 using Symbolics: scalarize
@@ -365,10 +365,10 @@ basis= Basis(implicits, [u; du], controls = x,  iv = t);
 println(basis) # hide
 ```
 
-And solve the problem by varying over a sufficient set of thresholds for the associated optimizer.
+We solve the problem by varying over a sufficient set of thresholds for the associated optimizer.
 Additionally we activate the `scale_coefficients` option for the [`ImplicitOptimizer](@ref), which helps to find sparse equations by normalizing the resulting coefficient matrix after each suboptimization.
 
-To evaluate the pareto optimal solution over, we use the functions `f` and `g` which can be passed as keyworded arguements into the `solve` function. `f` is a function with different signatures for different optimizers, but returns the ``L_0`` norm of the coefficients and the ``L_2`` error of the current model. `g` takes this vector and projects it down onto a scalar, using the ``L_2`` norm per default. However, here we want to use the `AIC`  of the output of `f`. A noteworthy exception is of course, that we want only results with two or more active coefficents. Hence we modify `g` accordingly.
+To evaluate the pareto optimal solution, we use the functions `f` and `g` which can be passed as keyworded arguements into the `solve` function. `f` is a function with different signatures for different optimizers, but returns the ``L_0`` norm of the coefficients and the ``L_2`` error of the current model. `g` takes this vector and projects it down onto a scalar, using the ``L_2`` norm per default. However, here we want to use the `AIC`  of the output of `f`. A noteworthy exception is of course, that we want only results with two or more active coefficents. Hence we modify `g` accordingly.
 
 ```@example 5
 λ = [1e-4;5e-4;1e-3;2e-3;3e-3;4e-3;5e-3;6e-3;7e-3;8e-3;9e-3;1e-2;2e-2;3e-2;4e-2;5e-2;
diff --git a/docs/src/symbolic_regression.md b/docs/src/symbolic_regression.md
index a7ff21a1d..ba04a0b84 100644
--- a/docs/src/symbolic_regression.md
+++ b/docs/src/symbolic_regression.md
@@ -5,7 +5,7 @@ Using [sparse regression](@ref sparse_optimization) limits the discovery to a ge
 ## SymbolicRegression
 
 !!! warning
-    This feature requires the explicit loading of [SymbolicRegression.jl](https://github.com/MilesCranmer/SymbolicRegression.jl) in addition to `DataDrivenDiffEq`. It will _only_ be useable if used like
+    This feature requires the explicit loading of [SymbolicRegression.jl](https://github.com/MilesCranmer/SymbolicRegression.jl) in addition to `DataDrivenDiffEq`. It will _only_ be useable if loaded like:
     ```julia
     using DataDrivenDiffEq
     using SymbolicRegression
@@ -49,18 +49,18 @@ EQSearch
 ## OccamNet
 
 !!! warning
-    This feature requires the explicit loading of [Flux.jl](https://fluxml.ai/) in addition to `DataDrivenDiffEq`. It will _only_ be useable if used like
+    This feature requires the explicit loading of [Flux.jl](https://fluxml.ai/) in addition to `DataDrivenDiffEq`. It will _only_ be useable if loaded like:
     ```julia
     using DataDrivenDiffEq
     using Flux
     ```
 
 As introduced in [Interpretable Neuroevolutionary Models for Learning Non-Differentiable Functions and Programs
-](https://arxiv.org/abs/2007.10784), `OccamNet` is a special form of symbolic regression which uses a probabilistic approach to equation discovery by using a feedforward multilayer neural network. In constrats to normal architectures, each layers weight reflect the probability which inputs to use. Additionally not only a single activation function is used, but a set of functions. Similar to simulated annealing, a temperature is included to control the exploration of possible functions.
+](https://arxiv.org/abs/2007.10784), `OccamNet` is a special form of symbolic regression which uses a probabilistic approach to equation discovery by using a feedforward multilayer neural network. In contrast to normal architectures, each layer's weights reflect the probability of which inputs to use. Additionally a set of activation functions is used, instead of a single function. Similar to simulated annealing, a temperature is included to control the exploration of possible functions.
 
 `DataDrivenDiffEq` offers two main interfaces to `OccamNet`: a `Flux` based API with `Flux.train!` and a `solve(...)` function.
 
-Consider the following example, where we want to discover a vector valued function
+Consider the following example, where we want to discover a vector valued function:
 
 ```@example occamnet_flux
 using DataDrivenDiffEq
@@ -78,20 +78,20 @@ f(x) = [sin(π*x[2]+x[1]); exp(x[2])]
 Y = hcat(map(f, eachcol(X))...)
 ```
 
-Next, we define our network
+Next, we define our network:
 
 ```@example occamnet_flux
 net = OccamNet(2, 2, 3, Function[sin, +, *, exp], skip = true, constants = Float64[π])
 ```
 
-Where `2,2,3` refers to input and output dimension and the number of layers _without the output layer_. We also define that each layer uses the functions `sin, +, *, exp` as activations an use a `π` as a constant, which get concanated to the input data. Additionally, `skip` indicates the useage of skip connections, which allow the output of each layer to be passed onto the output layer directly.
+Where `2,2,3` refers to input and output dimension and the number of layers _without the output layer_. We also define that each layer uses the functions `sin, +, *, exp` as activations and uses a `π` as a constant, which get concatenated to the input data. Additionally, `skip` indicates the useage of skip connections, which allow the output of each layer to be passed onto the output layer directly.
 
 To train the network over `100` epochs using `ADAM`, we type
 ```@example occamnet_flux
 Flux.train!(net, X, Y, ADAM(1e-2), 100, routes = 100, nbest = 3)
 ```
 
-Under the hood, we select `routes` possible routes through the network based on the probability reflect by the [`ProbabilityLayer`](@ref) forming the network. From these we take the `nbest` candidates to train the parameters of the network, meaning increase the probability of those routes.
+Under the hood, we select possible routes, `routes`, through the network based on the probability reflected by the [`ProbabilityLayer`](@ref) forming the network. From these we take the `nbest` candidates to train the parameters of the network, meaning increase the probability of those routes.
 
 Lets have a look at some possible equations after the initial training. We can use `rand` to sample a route through the network, compute the output probability with `probability` and transform it into analytical equations by simply using `ModelingToolkit`s variables as input. The call `net(x, route)` uses the route to compute just the element on this path.
 
@@ -118,7 +118,7 @@ for i in 1:10
 end
 ```
 
-The network is quite certain about the equation now, which is in fact our unknown mapping. To extract the solution with the highest probability, we set the temperature of the underlying distribution to a very low value. In the limit of `t ↦ 0` we approach a dirac distribution for the and hence extracting the most likely terms.
+The network is quite certain about the equation now, which is in fact our unknown mapping. To extract the solution with the highest probability, we set the temperature of the underlying distribution to a very low value. In the limit of `t ↦ 0` we approach a Dirac distribution, hence extracting the most likely terms.
 
 ```@example occamnet_flux
 set_temp!(net, 0.01)
@@ -142,13 +142,13 @@ println(res) #hide
 
 Within `solve` the network is generated using the information provided by the [DataDrivenProblem](@ref) in form of states, control and independent variables as well as the specified options, followed by training the network and extracting the equation with the highest probability by setting the temperature as above. After computing additional metrics, a [DataDrivenSolution](@ref) is returned where the equations are transformed  into a [`Basis`](@ref) useable with `ModelingToolkit`.
 
-The metrics can be accessed via
+The metrics can be accessed via:
 
 ```@example occamnet_flux
 metrics(res)
 ```
 
-and the resulting [`Basis`](@ref) by
+and the resulting [`Basis`](@ref) by:
 
 ```@example occamnet_flux
 result(res)
diff --git a/src/basis.jl b/src/basis.jl
index 53a987c6f..f4a26c380 100644
--- a/src/basis.jl
+++ b/src/basis.jl
@@ -4,7 +4,7 @@ using ModelingToolkit: value, operation, arguments, istree, get_observed
 """
 $(TYPEDEF)
 
-A basis over the states with parameters , independent variable  and possible exogenous controls.
+A basis over the states with parameters, independent variable, and possible exogenous controls.
 It extends an `AbstractSystem` as defined in `ModelingToolkit.jl`. `f` can either be a Julia function which is able to use ModelingToolkit variables or
 a vector of `eqs`.
 It can be called with the typical SciML signature, meaning out of place with `f(u,p,t)`
@@ -12,7 +12,7 @@ or in place with `f(du, u, p, t)`. If control inputs are present, it is assumed
 zero for all inputs. The corresponding function calls are `f(u,p,t,inputs)` and `f(du,u,p,t,inputs)` and need to
 be specified fully.
 
-If `linear_independent` is set to `true`, a linear independent basis is created from all atom function in `f`.
+If `linear_independent` is set to `true`, a linear independent basis is created from all atom functions in `f`.
 
 If `simplify_eqs` is set to `true`, `simplify` is called on `f`.
 
diff --git a/src/koopman/type.jl b/src/koopman/type.jl
index 7c9be901c..4eb760265 100644
--- a/src/koopman/type.jl
+++ b/src/koopman/type.jl
@@ -8,7 +8,7 @@ or in place with `f(du, u, p, t)`. If control inputs are present, it is assumed
 zero for all inputs. The corresponding function calls are `f(u,p,t,inputs)` and `f(du,u,p,t,inputs)` and need to
 be specified fully.
 
-If `linear_independent` is set to `true`, a linear independent basis is created from all atom function in `f`.
+If `linear_independent` is set to `true`, a linear independent basis is created from all atom functions in `f`.
 
 If `simplify_eqs` is set to `true`, `simplify` is called on `f`.
 
diff --git a/src/problem.jl b/src/problem.jl
index 07b5441b4..5ddea0f08 100644
--- a/src/problem.jl
+++ b/src/problem.jl
@@ -56,7 +56,7 @@ The `DataDrivenProblem` defines a general estimation problem given measurements,
 Two construction methods are available:
 
 + `DiscreteDataDrivenProblem` for time discrete systems
-+ `ContinousDataDrivenProblem` for systems continouos in time
++ `ContinousDataDrivenProblem` for systems continuous in time
 
 both are aliases for constructing a problem.
 
@@ -74,10 +74,10 @@ X, DX, t = data...
 # Define a discrete time problem
 prob = DiscreteDataDrivenProblem(X)
 
-# Define a continous time problem without explicit time points
+# Define a continuous time problem without explicit time points
 prob = ContinuousDataDrivenProblem(X, DX)
 
-# Define a continous time problem without explict derivatives
+# Define a continuous time problem without explicit derivatives
 prob = ContinuousDataDrivenProblem(X, t)
 
 # Define a discrete time problem with an input function as a function
diff --git a/src/symbolic_regression/occamnet.jl b/src/symbolic_regression/occamnet.jl
index 2fd119658..2812db462 100644
--- a/src/symbolic_regression/occamnet.jl
+++ b/src/symbolic_regression/occamnet.jl
@@ -167,11 +167,11 @@ Flux.trainable(u::ProbabilityLayer) = (u.weight,)
 """
 $(TYPEDEF)
 
-Defines a `OccamNet` which learns symbolic expressions from data using a probabalistic approach.
+Defines an `OccamNet` which learns symbolic expressions from data using a probabilistic approach.
 See [Interpretable Neuroevolutionary Models for Learning Non-Differentiable Functions and Programs
 ](https://arxiv.org/abs/2007.10784) for more details.
 
-It get constructed via
+It get constructed via:
 
 ```julia
 net = OccamNet(inp::Int, outp::Int, layers::Int, f::Vector{Function}, t::Real = 1.0; constants = typeof(t)[], parameters::Int = 0, skip::Bool = false, init_w = ones, init_p = Flux.glorot_uniform)
@@ -180,7 +180,7 @@ net = OccamNet(inp::Int, outp::Int, layers::Int, f::Vector{Function}, t::Real =
 `inp` describes the size of the input domain, `outp` the size of the output domain, `layers` the number of layers (including the input layer and excluding the linear output layer) and
 `f` the functions to be used. Optional is the temperature `t` which is set to `1.0` at the beginning.
 
-Keyworded arguments are `constants`, a vector of constants like π, ℯ which can concanated to the input, the number of trainable `parameters` and if `skip` connections should be used.
+Keyworded arguments are `constants`, a vector of constants like π, ℯ which can concatenated to the input, the number of trainable `parameters` and if `skip` connections should be used.
 The constructors to the weights and parameters can be passed in via `init_w` and `init_p`.
 
 `OccamNet` is callable with and without a specific route, which can be sampled from the networks weights via `rand(net)`.