Merge pull request #124 from gustavdelius/typos

Updating section that looks at Flux.jl code, which has changed
SciML · May 3, 2023 · 9228207 · 9228207
2 parents 1115c17 + 7aed018
commit 9228207
Show file tree

Hide file tree

Showing 3 changed files with 31 additions and 17 deletions.
diff --git a/_weave/lecture03/sciml.jmd b/_weave/lecture03/sciml.jmd
@@ -13,8 +13,8 @@ weave_options:
 
 Here we will start to dig into what scientific machine learning is all about
 by looking at physics-informed neural networks. Let's start by understanding
-what a neural network really is, why they are used, and what kinds of problems
-that they solve, and then we will use this understanding of a neural network
+what neural networks really are, why they are used, and what kinds of problems
+they solve, and then we will use this understanding of a neural network
 to see how to solve ordinary differential equations with neural networks.
 For there, we will use this method to regularize neural networks with physical
 equations, the aforementioned physics-informed neural network, and see how to
@@ -121,8 +121,8 @@ NN3(rand(10))
 ```
 
 The second activation function there is what's known as a `relu`. A `relu` can
-be good to use because it's an exceptionally operation and satisfies a form of
-the UAT. However, a downside is that its derivative is not continuous, which
+be good to use because it's an exceptionally fast operation and satisfies a form of
+the universal approximation theorem (UAT). However, a downside is that its derivative is not continuous, which
 could impact the numerical properties of some algorithms, and thus it's widely
 used throughout standard machine learning but we'll see reasons why it may be
 disadvantageous in some cases in scientific machine learning.
@@ -144,7 +144,7 @@ using InteractiveUtils
 @which Dense(10 => 32,tanh)
 ```
 
-If we go to that spot of the documentation, we find the following.
+If we go to that spot of the code, we find the following:
 
 ```julia;eval=false
 struct Dense{F, M<:AbstractMatrix, B}
@@ -164,14 +164,27 @@ end
 ```
 
 First, `Dense` defines a struct in Julia. This struct just holds a weight matrix
-`W`, a bias vector `b`, and an activation function `σ`. The function called
-`Dense` is what's known as an **outer constructor** which defines how the
-`Dense` type is built. If you give it two integers (and optionally an activation
+`W`, a bias vector `b`, and an activation function `σ`. It also defines an **inner constructor**
+that ensures that a created `Dense` object will have the desired properties and types
+for its fields.
+The function called `Dense` that is defined next, outside the `struct`, is what's known 
+as an **outer constructor** which provides a more convenient way to create a `Dense` 
+object. If you give it a `Pair` of integers (and optionally an activation
 function which defaults to `identity`), then what it will do is take random
 initial `W` and `b` matrices (according to the `glorot_uniform` distribution for
 `W` and `zeros` for `b`), and then it will build the type with those matrices.
 
-The last portion might be new. This is known as a **callable struct**, or a
+The next portion might be new. We give it here in the simpler form it had in earlier
+versions of the Flux package, so that we can concentrate on the essential:
+
+```julia;eval=false
+function (a::Dense)(x::AbstractArray)
+  W, b, σ = a.W, a.b, a.σ
+  σ.(W*x .+ b)
+end
+```
+
+This defines what is known as a **callable struct**, or a
 functor. It defines the dispatch for how calls work on the struct. As a quick
 demonstration, let's define a type `MyCallableStruct` with a field `x`, and then make instances
 of `A` be the function `x+y`:
@@ -191,7 +204,7 @@ an object in a way that references the `self`, though it's a bit more general
 due to allowing dispatching, i.e. this can then dependent on the input types
 as well.
 
-So let's look at that `Dense` call with this in mind:
+So let's look at `Dense` with this in mind:
 
 ```julia;eval=false
 function (a::Dense)(x::AbstractArray)
@@ -216,7 +229,8 @@ inside of them. Now what does `Chain` do?
 @which Chain(1,2,3)
 ```
 
-gives us:
+Again, for our explanations here we will look at the slightly simpler code From
+and earlier version of the Flux package:
 
 ```julia;eval=false
 struct Chain{T<:Tuple}
@@ -365,7 +379,7 @@ loss() = sum(abs2,sum(abs2,NN(rand(10)).-1) for i in 1:100)
 loss()
 ```
 
-This loss function takes 100 random points in `[0,1]` and then computes the output
+This loss function takes 100 random points in ``[0,1]^{10}`` and then computes the output
 of the neural network minus `1` on each of the values, and sums up the squared
 values (`abs2`). Why the squared values? This means that every computed loss value
 is positive, and so we know that by decreasing the loss this means that, on average
@@ -660,7 +674,7 @@ one-dimensional spring pushing and pulling against a wall.
 
 But instead of the simple spring, let's assume we had a more complex spring,
 for example, let's say ``F(x) = -kx + 0.1sin(x)`` where this extra term is due to
-some deformities in the medal (assume mass=1). Then by Newton's law of motion
+some deformities in the metal (assume mass=1). Then by Newton's law of motion
 we have a second order ordinary differential equation:
 
 ```math

diff --git a/course/index.md b/course/index.md
@@ -7,7 +7,7 @@ weave = false
 
 ## Syllabus
 
-**Pre-recorded online lectures are available to compliment the lecture notes**
+**Pre-recorded online lectures are available to complement the lecture notes**
 
 **Prerequisites**: While this course will be mixing ideas from high performance
 computing, numerical analysis, and machine learning, no one in the course is

diff --git a/index.md b/index.md
@@ -54,15 +54,15 @@ modeling.
 
 However, these methods will quickly run into a scaling issue if naively coded.
 To handle this problem, everything will have a focus on performance-engineering.
-We will start by focusing on algorithm which are inherently serial and
+We will start by focusing on algorithms that are inherently serial and
 learn to optimize serial code. Then we will showcase how logic-heavy
 code can be parallelized through multithreading and distributed computing
 techniques like MPI, while direct mathematical descriptions can be parallelized
 through GPU computing.
 
 The final part of the course will be a unique project which pulls together these
 techniques. As a new field, the students will be exposed to the "low hanging
-fruit" and will be directed towards an area which they can make a quick impact.
+fruit" and will be directed towards an area in which they can make a quick impact.
 For the final project, students will team up to solve a new problem in the field of
-scientific machine learning, and receive helping writing up a publication-quality
+scientific machine learning, and receive help in writing up a publication-quality
 analysis about their work.