Skip to content
CarlosJHR64 edited this page Apr 10, 2023 · 63 revisions

Neuronet wiki

Here's a quick review of the math. Please allow the terse notation as the algebra gets gnarly.

Syntax

Operator precedence is as in ruby:

  • Unary right binding operators
  • *, /
  • +, -
  • =

But I add spacing to create groups:

  • ๐‘Ž + ๐‘/๐‘ + ๐‘‘ = ๐‘Ž + (๐‘/๐‘) + ๐‘‘
  • ๐‘Ž+๐‘ / ๐‘+๐‘‘ = (๐‘Ž+๐‘) / (๐‘+๐‘‘)

The above spacing rule reduces the amount of symbols needed to show structure and makes the algebra less cluttered.

The product, *, may be implied:

  • ๐‘Ž*๐‘ = ๐‘Ž ๐‘ = ๐‘Ž๐‘
  • (๐‘Ž+๐‘)*(๐‘+๐‘‘) = ๐‘Ž+๐‘ ๐‘+๐‘‘
  • ๐‘ฅยฒ = ๐‘ฅ๐‘ฅ = ๐‘ฅ*๐‘ฅ

Definitions are set by := and consequent equivalences by =.

I may use Einstein notation. And once indices are shown, they may be dropped:

  • โˆ‘โ‚™(๐‘พโ‚™*๐’‚โ‚™) = ๐‘พโฟ๐’‚โ‚™ = ๐‘พ๐’‚

Be aware of these rules.

Style

Referencing Wikipedia's Mathematical operators and symbols in Unicode and Unicode subscripts and superscripts:

  • Italic small(๐‘Ž..๐‘ง): scalar variables
  • Bold italic small(๐’‚..๐’›): single-indexed variables, vectors.
  • Bold italic capital(๐‘จ..๐’): multi-indexed variables, matrices.
  • Bold script capital(๐“..๐“ฉ): operators, like ๐““๐‘ฅ.
  • Double struck small(๐•’..๐•ซ): finite ordered sets.
  • Bold Fraktur small(๐–†..๐–Ÿ): derived constant parameters.

Next level unary postfix operator

Consider a value in a collection of ๐’‚ in level h dependent on values in collection of ๐’‚ in level i:

  • ๐’‚โ‚• := โŒˆ(๐’ƒโ‚• + โˆ‘แตข(๐‘พโ‚•แตข * ๐’‚แตข))

The index โ‚• enumerates values of ๐’‚ in level h, whereas แตข enumerates values of ๐’‚ in level i. The levels are labeled alphabetically:

  • {...,โ‚•,แตข,โฑผ,โ‚–,โ‚—,โ‚˜,โ‚™,โ‚’,โ‚š,...}

I'll want to express the relation between levels without specifying the level. Given the above, please allow:

  • ๐’‚ = โŒˆ(๐’ƒ + ๐‘พ ๐’‚')
  • ๐’‚ = โŒˆ ๐’ƒ+๐‘พ(๐’‚')
  • ๐’‚ = โŒˆ ๐’ƒ+๐‘พ๐’‚'

Binary competition

In The Math of Species Conflict - Numberphile the following function is referred to as "binary competition":

  • ๐“‘(๐‘ฅ) := ๐‘ฅ * (1 - ๐‘ฅ)

This form occurs in the derivative of the squash function, and so I'll use ๐“‘ in it's expression.

Squash

# Please let:
โŒ‰(๐‘ฅ) := Math.exp(๐‘ฅ)
# Define the squash function:
โŒˆ(๐‘ฅ) := 1 / (1 + Math.exp(-๐‘ฅ))
โŒˆ(๐‘ฅ) = 1 / (1 + โŒ‰(-๐‘ฅ))
โŒˆ๐‘ฅ = 1 / 1+โŒ‰-๐‘ฅ
   = โŒ‰๐‘ฅ / โŒ‰๐‘ฅ+1
โŒˆ๐‘ฅ = โŒ‰๐‘ฅ / 1+โŒ‰๐‘ฅ
โŒˆ(๐‘ฅ) = โŒ‰(๐‘ฅ) / (1 + โŒ‰(๐‘ฅ)) # Alternate definition of squash
# Equivalence 1-โŒˆ๐‘ฅ = โŒˆ-๐‘ฅ
1 - โŒˆ(๐‘ฅ) = 1 - (โŒ‰(๐‘ฅ) / (1 + โŒ‰(๐‘ฅ)))
1-โŒˆ๐‘ฅ = 1 - โŒ‰๐‘ฅ / 1+โŒ‰๐‘ฅ
     = โŒ‰๐‘ฅ+1-โŒ‰๐‘ฅ / 1+โŒ‰๐‘ฅ
     = 1 / 1+โŒ‰๐‘ฅ
1-โŒˆ๐‘ฅ = โŒˆ-๐‘ฅ
1 - โŒˆ(๐‘ฅ) = โŒˆ(-๐‘ฅ)
# Equivalence โŒˆ-๐‘ฅ = 1-โŒˆ๐‘ฅ
โŒˆ(-๐‘ฅ) = 1 - โŒˆ(๐‘ฅ)
โŒˆ-๐‘ฅ = 1-โŒˆ๐‘ฅ
# Equivalence โŒˆ๐‘ฅ = 1-โŒˆ-๐‘ฅ
โŒˆ(๐‘ฅ) = 1 - โŒˆ(-๐‘ฅ)
โŒˆ๐‘ฅ = 1-โŒˆ-๐‘ฅ
# Derivative:
๐““๐‘ฅ(โŒˆ(๐‘ฅ)) = ๐““๐‘ฅ(1 / (1 + โŒ‰(-๐‘ฅ)))
๐““๐‘ฅโŒˆ๐‘ฅ = ๐““๐‘ฅ(1 / 1+โŒ‰-๐‘ฅ)
     = 1/(1+โŒ‰-๐‘ฅ)ยฒ -๐““๐‘ฅโŒ‰-๐‘ฅ
     = 1/(1+โŒ‰-๐‘ฅ)ยฒ โŒ‰-๐‘ฅ
     = โŒ‰-๐‘ฅ/(1+โŒ‰-๐‘ฅ)ยฒ 
     = โŒ‰-๐‘ฅ/(1+โŒ‰-๐‘ฅ) 1/(1+โŒ‰-๐‘ฅ)
     = โŒ‰-๐‘ฅ/(1+โŒ‰-๐‘ฅ) โŒˆ๐‘ฅ
     = 1/(โŒ‰๐‘ฅ+1) โŒˆ๐‘ฅ
     = 1/(1+โŒ‰๐‘ฅ) โŒˆ๐‘ฅ
     = โŒˆ-๐‘ฅ โŒˆ๐‘ฅ
๐““๐‘ฅโŒˆ๐‘ฅ = 1-โŒˆ๐‘ฅ โŒˆ๐‘ฅ
๐““๐‘ฅ(โŒˆ(๐‘ฅ)) = (1 - โŒˆ(๐‘ฅ)) * โŒˆ(๐‘ฅ)
         = ๐“‘(โŒˆ(๐‘ฅ))

Unsquash

# Please let:
โŒŠ(๐‘ฅ) := Math.log(๐‘ฅ)
# Recall that Log and Exp are inverses:
โŒŠ(โŒ‰(๐‘ฅ)) = ๐‘ฅ
โŒŠโŒ‰๐‘ฅ = ๐‘ฅ
# Recall that Log(1)=0
โŒŠ(1) = 0
# Define the unsquash function:
โŒ‹(๐‘ฅ) := Math.log(๐‘ฅ / (1 - ๐‘ฅ))
โŒ‹(๐‘ฅ) = โŒŠ(๐‘ฅ / (1 - ๐‘ฅ))
โŒ‹๐‘ฅ = โŒŠ ๐‘ฅ/(1-๐‘ฅ)
# Show that unsquash is the inverse of squash:
โŒ‹(โŒˆ(๐‘ฅ)) = โŒ‹(โŒˆ(๐‘ฅ))
โŒ‹โŒˆ๐‘ฅ = โŒ‹ โŒˆ๐‘ฅ
    = โŒŠ โŒˆ๐‘ฅ/(1-โŒˆ๐‘ฅ)  # by definition of unsquash, it's the log of...
    = โŒŠโŒˆ๐‘ฅ - โŒŠ 1-โŒˆ๐‘ฅ
    = โŒŠ โŒ‰๐‘ฅ/(โŒ‰๐‘ฅ+1) - โŒŠ 1-โŒˆ๐‘ฅ  # by alternate definition of squash.
    = โŒŠโŒ‰๐‘ฅ - โŒŠ โŒ‰๐‘ฅ+1 - โŒŠ 1-โŒˆ๐‘ฅ
    = ๐‘ฅ - โŒŠ โŒ‰๐‘ฅ+1 - โŒŠ 1-โŒˆ๐‘ฅ
    = ๐‘ฅ - โŒŠ โŒ‰๐‘ฅ+1 - โŒŠ 1-โŒ‰๐‘ฅ/(โŒ‰๐‘ฅ+1)
    = ๐‘ฅ - โŒŠ โŒ‰๐‘ฅ+1 - โŒŠ (โŒ‰๐‘ฅ+1-โŒ‰๐‘ฅ)/(โŒ‰๐‘ฅ+1)
    = ๐‘ฅ - โŒŠ โŒ‰๐‘ฅ+1 - โŒŠ 1/(โŒ‰๐‘ฅ+1)
    = ๐‘ฅ - โŒŠ โŒ‰๐‘ฅ+1 - (โŒŠ1 - โŒŠ โŒ‰๐‘ฅ+1)
    = ๐‘ฅ - โŒŠ โŒ‰๐‘ฅ+1 - (0 - โŒŠ โŒ‰๐‘ฅ+1)
    = ๐‘ฅ - โŒŠ โŒ‰๐‘ฅ+1 - (-โŒŠ โŒ‰๐‘ฅ+1)
    = ๐‘ฅ - โŒŠ โŒ‰๐‘ฅ+1 + โŒŠ โŒ‰๐‘ฅ+1
โŒ‹โŒˆ๐‘ฅ = ๐‘ฅ
โŒ‹(โŒˆ(๐‘ฅ)) = ๐‘ฅ

Activation and value of a neuron

# The activation of the h-th Neuron(in level h connecting to level i):
๐’‚โ‚• := โŒˆ(๐’ƒโ‚• + โˆ‘แตข(๐‘พโ‚•แตข * ๐’‚แตข))
   = โŒˆ ๐’ƒโ‚•+๐‘พโฑ๐’‚แตข
๐’‚ = โŒˆ ๐’ƒ+๐‘พ๐’‚'
โŒ‹๐’‚ = ๐’ƒ+๐‘พ๐’‚'
โŒ‹๐’‚โ‚• = ๐’ƒโ‚•+๐‘พโฑ๐’‚แตข
โŒ‹(๐’‚โ‚•) = ๐’ƒโ‚• + โˆ‘แตข(๐‘พโ‚•แตข * ๐’‚แตข)
# The value of the h-th Neuron is the unsquashed activation:
๐’—โ‚• = โŒ‹(๐’‚โ‚•)
   = ๐’ƒโ‚• + โˆ‘แตข(๐‘พโ‚•แตข * ๐’‚แตข)
๐’— = ๐’ƒ + ๐‘พ ๐’‚'

Mirroring

# The bias and weight of a neuron that roughly mirrors the value of another:
๐•ง := {-1, 0, 1}
๐•’ := โŒˆ(๐–‡ + (๐–œ * ๐•’)) = โŒˆ ๐–‡+๐–œ*๐•’
๐•ง := โŒ‹(๐•’) = โŒ‹๐•’
# Notice that:
๐•’ = โŒˆ(๐•ง) = {โŒˆ(-1), โŒˆ(0), โŒˆ(1)}
โŒˆ(0) = โŒˆ0 = ยฝ
# Find the bias and weight:
๐•ง = โŒ‹โŒˆ(๐–‡ + (๐–œ * ๐•’))
  = โŒ‹โŒˆ๐–‡+๐–œ๐•’
  = โŒ‹โŒˆ ๐–‡+๐–œโŒˆ๐•ง
  = ๐–‡+๐–œโŒˆ๐•ง
๐•ง = ๐–‡ + (๐–œ * โŒˆ(๐•ง))
# Set the value to zero:
0 = ๐–‡ + ๐–œโŒˆ(0)
0 = ๐–‡+๐–œโŒˆ0
๐–‡ = -๐–œโŒˆ0
๐–‡ = -ยฝ๐–œ
๐–œ = -2๐–‡
# Set the value to one and substitute the bias:
1 = ๐–‡ + ๐–œโŒˆ(1)
1 = ๐–‡+๐–œโŒˆ1
1 = -ยฝ๐–œ+๐–œโŒˆ1
1 = ๐–œ(โŒˆ1 - ยฝ)
๐–œ = 1 / (โŒˆ1 - ยฝ)
๐–‡ = ยฝ / (ยฝ - โŒˆ1)
# Verify this works when value is negative one:
-1 = ๐–‡ + (๐–œ * โŒˆ(-1))
-1 = ๐–‡ + ๐–œโŒˆ-1
-1 = -ยฝ๐–œ + ๐–œโŒˆ-1
-1 = -ยฝ๐–œ + ๐–œ(1-โŒˆ1)
-1 = -ยฝ๐–œ + ๐–œ - ๐–œโŒˆ1
-1 = ยฝ๐–œ - ๐–œโŒˆ1
1 = ๐–œโŒˆ1 - ยฝ๐–œ
1 = ๐–œ(โŒˆ1 - ยฝ)
๐–œ = 1 / (โŒˆ1 - ยฝ)
๐–œ = 1 / (โŒˆ(1) - ยฝ) # OK

Propagation of errors level 1(Perceptron)

# Value is the unsquashed activation:
๐’—โ‚• := โŒ‹(๐’‚โ‚•)
๐’— = โŒ‹๐’‚
# Error in output value from errors in bias and weights:
๐’—โ‚• + ๐’†โ‚• := (๐’ƒโ‚• + ๐œบโ‚•) + โˆ‘แตข((๐‘พโ‚•แตข + ๐œบแตข) * ๐’‚แตข)
๐’—+๐’† = ๐’ƒ+๐œบ + (๐‘พ+๐œบ')๐’‚'
๐’† = ๐’ƒ+๐œบ + (๐‘พ+๐œบ')๐’‚'- ๐’—
๐’† = ๐’ƒ + ๐œบ + ๐‘พ๐’‚' + ๐œบ'๐’‚' - ๐’—
๐’† = ๐œบ + ๐œบ'๐’‚' + (๐’ƒ + ๐‘พ๐’‚') - ๐’—
๐’† = ๐œบ + ๐œบ'๐’‚' + (๐’—) - ๐’—
๐’† = ๐œบ + ๐œบ'๐’‚'
๐’†โ‚• = ๐œบโ‚• + ๐œบโฑ๐’‚แตข
๐’†โ‚• = ๐œบโ‚• + โˆ‘แตข(๐œบแตข * ๐’‚แตข)
# Assume equipartition of errors:
โˆ€โ‚“{ ๐œบโ‚“ = ๐œ€ }
๐’†โ‚• = ๐œบโ‚• + โˆ‘แตข(๐œบแตข * ๐’‚แตข)
   = ๐œ€ + โˆ‘แตข(๐œ€ * ๐’‚แตข)
   = ๐œ€ + ๐œ€โˆ‘๐’‚แตข
   = ๐œ€(1 + โˆ‘๐’‚แตข)
๐’†โ‚• = ๐œ€ * (1 + โˆ‘แตข(๐’‚แตข))
# *** Equipartitioned error level one ***
# Solve for ๐œ€:
๐œ€ = ๐’†โ‚• / 1+โˆ‘๐’‚แตข
๐œ€ = ๐’†โ‚• / (1 + โˆ‘แตข(๐’‚แตข))
# Mu
๐โ‚• := 1 + โˆ‘แตข(๐’‚แตข)
๐ = 1+โˆ‘๐’‚'
๐œ€ = ๐’†โ‚• / ๐โ‚•
๐œ€ = ๐’†/๐
๐’† = ๐œ€๐
๐’†โ‚• = ๐œ€ * ๐โ‚•
# As an estimate, set ๐’‚~ยฝ and the length of โˆ‘แตข at ๐‘:
๐œ€ ~ ๐’† / (1 + ยฝ๐‘)
# Or very roughly:
๐œ€ ~ 2๐’†/๐‘
# Activation error
๐’‚โ‚• + ๐œนโ‚• = โŒˆ(๐’—โ‚• + ๐’†โ‚•)
๐’‚+๐œน = โŒˆ ๐’—+๐’†
    ~ โŒˆ๐’— + ๐’†๐““๐’—โŒˆ๐’—
    ~ โŒˆ๐’— + ๐’†๐“‘โŒˆ๐’—
    ~ โŒˆ๐’— + ๐’†๐“‘๐’‚
๐’‚โ‚• + ๐œนโ‚• ~ ๐’‚โ‚• + (๐’†โ‚• * ๐“‘(๐’‚โ‚•))
        ~ ๐’‚โ‚• + (๐’†โ‚• * (1 - ๐’‚โ‚•) * ๐’‚โ‚•)
๐œนโ‚• ~ ๐’†โ‚• * (1 - ๐’‚โ‚•) * ๐’‚โ‚•
   ~ ๐’†โ‚• * ๐“‘(๐’‚โ‚•)
๐œน ~ ๐’†๐“‘๐’‚
  ~ ๐’†(1-๐’‚)๐’‚
# Recall that ๐’†=๐œ€๐:
๐œน ~ ๐œ€๐(1-๐’‚)๐’‚
  ~ ๐œ€๐๐“‘๐’‚
๐œนโ‚• ~ ๐œ€ * ๐โ‚• * ๐“‘(๐’‚โ‚•)
   ~ ๐œ€ * ๐โ‚• * (1 - ๐’‚โ‚•) * ๐’‚โ‚•

Vanishing small errors

# Assume ๐œ€ยฒ~0
๐œ€ยฒ ~ 0
# Consider ๐œ€๐œน
๐œ€ * ๐œนโ‚• = ๐œ€ * ๐œ€ * ๐โ‚• * ๐“‘(๐’‚โ‚•)
       = ๐œ€ยฒ๐๐“‘๐’‚
       ~ 0 * ๐๐“‘๐’‚
๐œ€๐œน ~ 0
๐œ€ * ๐œนโ‚• ~ 0

Propagation of errors level 2

# Error in ouput value from errors in bias and weights and activation:
๐’—โ‚• + ๐’†โ‚• := (๐’ƒโ‚• + ๐œบโ‚•) + โˆ‘แตข((๐‘พโ‚•แตข + ๐œบแตข) * (๐’‚แตข + ๐œนแตข))
๐’—+๐’† = ๐’ƒ+๐œบ + (๐‘พ+๐œบ')(๐’‚'+๐œน')
    = ๐’ƒ + ๐œ€ + ๐‘พ๐’‚' + ๐‘พ๐œน' + ๐œบ'๐’‚' + ๐œบ'๐œน'
    ~ ๐’ƒ + ๐œ€ + ๐‘พ๐’‚' + ๐‘พ๐œน' + ๐œบ'๐’‚' # ๐œ€๐œน vanishes
    ~ ๐’ƒ + ๐‘พ๐’‚' + ๐‘พ๐œน' + ๐œ€ + ๐œบ'๐’‚'
    ~ ๐’— + ๐‘พ๐œน' + ๐œ€ + ๐œบ'๐’‚'
๐’† ~ ๐‘พ๐œน' + ๐œ€ + ๐œบ'๐’‚'
๐’† ~ ๐‘พ๐œน' + ๐œ€(1+โˆ‘๐’‚')
๐’† ~ ๐‘พ๐œน' + ๐œ€๐
๐’† ~ ๐œ€๐ + ๐‘พ๐œน' # Same as level one with an extra +๐‘พ๐œน'
# Recall ๐œน ~ ๐’†๐“‘๐’‚:
๐’‚+๐œน = โŒˆ ๐’—+๐’†
    ~ ๐’‚ + ๐’†๐“‘๐’‚
๐œน ~ ๐’†๐“‘๐’‚
# Substitute out ๐œน':
๐’† ~ ๐œ€๐ + ๐‘พ๐œน'
  ~ ๐œ€๐ + ๐‘พ ๐’†'๐“‘๐’‚'
  ~ ๐œ€๐ + ๐‘พ ๐“‘๐’‚'๐’†'
# Substitute out ๐’†':
๐’† ~ ๐œ€๐ + ๐‘พ ๐“‘๐’‚'๐’†'
  ~ ๐œ€๐ + ๐‘พ ๐“‘๐’‚'(๐œ€๐' + ๐‘พ'๐œน")
  ~ ๐œ€๐ + ๐‘พ ๐“‘๐’‚'๐œ€๐' + ๐‘พ ๐“‘๐’‚'๐‘พ'๐œน"
  ~ ๐œ€๐ + ๐œ€๐‘พ ๐“‘๐’‚'๐' + ๐‘พ ๐“‘๐’‚'๐‘พ'๐œน" # reorder
  ~ ๐œ€(๐ + ๐‘พ ๐“‘๐’‚'๐') + ๐‘พ ๐“‘๐’‚'๐‘พ'๐œน"
# Introduce ๐œง :
๐œงโ‚•โฑ๐แตข := โˆ‘แตข ๐‘พโ‚•แตข๐“‘๐’‚แตข๐แตข
๐œง ๐' = ๐‘พ ๐“‘๐’‚'๐'
# Substitute in ๐œง :
๐’† ~ ๐œ€(๐ + ๐‘พ ๐“‘๐’‚'๐') + ๐‘พ ๐“‘๐’‚'๐‘พ'๐œน"
  ~ ๐œ€(๐ + ๐œง ๐') + ๐œง ๐‘พ'๐œน"
# *** Equipartitioned error level two ***
# For level two, ๐œน"=0
๐’† ~ ๐œ€(๐ + ๐œง ๐')
๐’†โ‚• ~ ๐œ€ * (๐โ‚• + ๐œงโ‚•โฑ๐แตข)
# Solve for ๐œ€:
๐œ€ ~ ๐’† / (๐ + ๐œง ๐')
๐œ€โ‚• ~ ๐’†โ‚• / (๐โ‚• + ๐œงโ‚•โฑ๐แตข)
# Notice that:
0 < ๐’‚ < 1
0 < ๐“‘๐’‚=(1-๐’‚)๐’‚ < 0.25 = ยผ
# So there's an upper bound for ๐’†:
๐’† ~ ๐œ€(๐ + ๐œง ๐')
  ~ ๐œ€(๐ + ๐‘พ ๐“‘๐’‚'๐')
|๐’†| < |๐œ€(๐ + ยผ๐‘พ ๐')|
# Assume ๐’‚ is somewhat random about 0.5=ยฝ in a level of size large ๐‘:
๐ = 1+โˆ‘๐’‚'  โ‡’  ๐”ช ~ 1+ยฝ๐‘ ~ ยฝ๐‘
|๐’†| <~ |๐œ€(๐”ช + ยผ๐”ช โˆ‘๐‘พ)|
# Consider the case when weights are random plus or minus one.
# Let this be like a random walk of ๐‘ steps.
# Then โˆ‘๐‘พ ~ โˆš๐‘:
|๐’†| <~ |๐œ€(๐”ช + ยผ๐”ช โˆš๐‘)|
    <~ |๐œ€(ยฝ๐‘ + ยผ*ยฝ๐‘*โˆš๐‘)|
    <~ ๐‘|๐œ€(ยฝ + ยผ*ยฝโˆš๐‘)|
|๐’†| <~ ๐‘โˆš(๐‘)|๐œ€|/8
# If you don't believe the random walk and are pessimistic, you might prefer
# using ๐‘ยฒ:
๐’† <~ ๐œ€๐‘โˆš๐‘/8 < ๐œ€๐‘ยฒ/8
๐œ€ ~> 8๐’† / ๐‘โˆš๐‘ > 8๐’†/๐‘ยฒ

Explicit propagation of errors level 2

๐’—โ‚• := ๐’ƒโ‚• + โˆ‘แตข(๐‘พโ‚•แตข * ๐’‚แตข)
๐’—โ‚• + ๐’†โ‚• := (๐’ƒโ‚• + ๐œบโ‚•) + โˆ‘แตข((๐‘พโ‚•แตข + ๐œบแตข) * (๐’‚แตข + ๐œนแตข))
๐’—แตข + ๐’†แตข := (๐’ƒแตข + ๐œบแตข) + โˆ‘โฑผ((๐‘พแตขโฑผ + ๐œบโฑผ) * (๐’‚โฑผ + ๐œนโฑผ))
๐’‚แตข + ๐œนแตข := โŒˆ(๐’—แตข + ๐’†แตข)
        = โŒˆ((๐’ƒแตข + ๐œบแตข) + โˆ‘โฑผ((๐‘พแตขโฑผ + ๐œบโฑผ) * (๐’‚โฑผ + ๐œนโฑผ)))
        = โŒˆ(๐’ƒแตข + ๐œบแตข + โˆ‘โฑผ(๐‘พแตขโฑผ*๐’‚โฑผ + ๐œบโฑผ*๐’‚โฑผ + ๐‘พแตขโฑผ*๐œนโฑผ + ๐œบโฑผ*๐œนโฑผ))
        = โŒˆ(๐’ƒแตข + ๐œบแตข + ๐‘พแตขสฒ๐’‚โฑผ + ๐œบสฒ๐’‚โฑผ + ๐‘พแตขสฒ๐œนโฑผ + ๐œบสฒ๐œนโฑผ)
        = โŒˆ(๐’ƒแตข + ๐œบแตข + ๐‘พแตขสฒ๐’‚โฑผ + ๐œบสฒ๐’‚โฑผ + ๐‘พแตขสฒ๐œนโฑผ) # ๐œบ๐œน  vanishes
        = โŒˆ(๐’ƒแตข + ๐‘พแตขสฒ๐’‚โฑผ + ๐œบแตข + ๐œบสฒ๐’‚โฑผ + ๐‘พแตขสฒ๐œนโฑผ)
        = โŒˆ(๐’ƒแตข + ๐‘พแตขสฒ๐’‚โฑผ + ๐œ€ + ๐œ€โˆ‘๐’‚โฑผ + ๐‘พแตขสฒ๐œนโฑผ) # All ๐œบ are the same ๐œ€
        = โŒˆ(๐’ƒแตข + ๐‘พแตขสฒ๐’‚โฑผ + ๐œ€(1 + โˆ‘๐’‚โฑผ) + ๐‘พแตขสฒ๐œนโฑผ)
        = โŒˆ(๐’ƒแตข + ๐‘พแตขสฒ๐’‚โฑผ + ๐œ€๐แตข + ๐‘พแตขสฒ๐œนโฑผ) # ๐แตข=1+โˆ‘๐’‚โฑผ as ๐=1+โˆ‘๐’‚'
        ~ ๐’‚แตข + (๐œ€๐แตข + ๐‘พแตขสฒ๐œนโฑผ) ๐“‘๐’‚แตข
        ~ ๐’‚แตข + (๐œ€๐แตข + ๐‘พแตขสฒ๐œนโฑผ)(1-๐’‚แตข)๐’‚แตข
๐’‚แตข + ๐œนแตข ~ ๐’‚แตข + (๐œ€๐แตข + โˆ‘โฑผ(๐‘พแตขโฑผ * ๐œนโฑผ)) * (1 - ๐’‚แตข) * ๐’‚แตข
# Solve for ๐œนแตข:
๐œนแตข ~ (๐œ€๐แตข + โˆ‘โฑผ(๐‘พแตขโฑผ * ๐œนโฑผ)) * (1 - ๐’‚แตข) * ๐’‚แตข
๐œนแตข ~ (๐œ€๐แตข+๐‘พแตขสฒ๐œนโฑผ)(1-๐’‚แตข)๐’‚แตข
๐œนแตข ~ ๐œ€๐แตข(1-๐’‚แตข)๐’‚แตข + ๐‘พแตขสฒ๐œนโฑผ(1-๐’‚แตข)๐’‚แตข
# Consider the case where the j-th level is error free input:
๐œนแตข ~ ๐œ€๐แตข(1-๐’‚แตข)๐’‚แตข # ๐œนโฑผ is zero
๐’—โ‚• + ๐’†โ‚• := (๐’ƒโ‚• + ๐œบโ‚•) + โˆ‘แตข((๐‘พโ‚•แตข + ๐œบแตข) * (๐’‚แตข + ๐œนแตข))
        ~ (๐’ƒโ‚• + ๐œบโ‚•) + โˆ‘แตข((๐‘พโ‚•แตข + ๐œบแตข) * (๐’‚แตข + ๐œ€๐แตข(1-๐’‚แตข)๐’‚แตข))
        ~ ๐’ƒโ‚• + ๐œบโ‚• + ๐‘พโ‚•โฑ(๐’‚แตข + ๐œ€๐แตข(1-๐’‚แตข)๐’‚แตข) + ๐œบโฑ(๐’‚แตข + ๐œ€๐แตข(1-๐’‚แตข)๐’‚แตข)
        ~ ๐’ƒโ‚• + ๐œบโ‚• + ๐‘พโ‚•โฑ๐’‚แตข + ๐œ€๐‘พโ‚•โฑ๐แตข(1-๐’‚แตข)๐’‚แตข + ๐œบโฑ๐’‚แตข + ๐œบโฑ๐œ€๐แตข(1-๐’‚แตข)๐’‚แตข
        ~ ๐’ƒโ‚• + ๐œบโ‚• + ๐‘พโ‚•โฑ๐’‚แตข + ๐œ€๐‘พโ‚•โฑ๐แตข(1-๐’‚แตข)๐’‚แตข + ๐œบโฑ๐’‚แตข # ๐œบโฑ๐œ€ vanishes
        ~ ๐’ƒโ‚• + ๐‘พโ‚•โฑ๐’‚แตข + ๐œ€๐‘พโ‚•โฑ๐แตข(1-๐’‚แตข)๐’‚แตข + ๐œบโ‚• + ๐œบโฑ๐’‚แตข # reordered terms
        ~ ๐’—โ‚• + ๐œ€๐‘พโ‚•โฑ๐แตข(1-๐’‚แตข)๐’‚แตข + ๐œบโ‚• + ๐œบโฑ๐’‚แตข
        ~ ๐’—โ‚• + ๐œ€๐‘พโ‚•โฑ๐แตข(1-๐’‚แตข)๐’‚แตข + ๐œ€(1+โˆ‘๐’‚แตข)
        ~ ๐’—โ‚• + ๐œ€(1+โˆ‘๐’‚แตข) + ๐œ€๐‘พโ‚•โฑ๐แตข(1-๐’‚แตข)๐’‚แตข # reordered
        ~ ๐’—โ‚• + ๐œ€๐โ‚• + ๐œ€๐œงโ‚•โฑ๐แตข # ๐œง = ๐‘พ๐“‘๐’‚'
๐’—โ‚• + ๐’†โ‚• ~ ๐’—โ‚• + ๐œ€(๐โ‚• + ๐œงโ‚•โฑ๐แตข)
๐’†โ‚• ~ ๐œ€(๐โ‚• + ๐œงโ‚•โฑ๐แตข)
๐œ€ ~ ๐’†โ‚• / (๐โ‚• + ๐œงโ‚•โฑ๐แตข)
๐œ€ ~ ๐’† / (๐ + ๐œง ๐') # OK!

Explicit propagation of errors level 3

# Given:
๐’‚โ‚• := โŒˆ(๐’—โ‚•)
๐’‚โ‚• + ๐œนโ‚• := โŒˆ(๐’—โ‚• + ๐’†โ‚•)
๐’—โ‚• := ๐’ƒโ‚• + โˆ‘แตข(๐‘พโ‚•แตข * ๐’‚แตข)
๐’—โ‚• + ๐’†โ‚• := (๐’ƒโ‚• + ๐œบโ‚•) + โˆ‘แตข((๐‘พโ‚•แตข + ๐œบแตข) * (๐’‚แตข + ๐œนแตข))
๐โ‚• := 1 + โˆ‘แตข(๐’‚แตข)
๐œงโ‚•โฑ๐แตข := โˆ‘แตข(๐‘พโ‚•แตข * (1 - ๐’‚แตข) * ๐’‚แตข * ๐แตข)
       = ๐‘พโ‚•โฑ๐“‘๐’‚แตข๐แตข
# Assume:
โˆ€โ‚“{ ๐œบโ‚“ = ๐œ€ }
๐œ€ยฒ ~ 0
๐œ€๐œน ~ 0
# Recall:
๐““๐‘ฅ(โŒˆ(๐‘ฅ)) = โŒˆ(๐‘ฅ) * (1 - โŒˆ(๐‘ฅ))
         = ๐“‘(โŒˆ(๐‘ฅ))
โŒˆ(๐‘ฅ + ๐œ€) ~ โŒˆ(๐‘ฅ) + ๐œ€ * ๐““๐‘ฅ(โŒˆ(๐‘ฅ))
         ~ โŒˆ(๐‘ฅ) + ๐œ€ * โŒˆ(๐‘ฅ) * (1 - โŒˆ(๐‘ฅ))
         ~ โŒˆ(๐‘ฅ) + ๐œ€ * ๐“‘(โŒˆ(๐‘ฅ))
# Note that one may transpose indices for each level:
โ‚•โฌŒแตขโฌŒโฑผโฌŒโ‚–
# Solve for level 3 ๐œ€.
## ๐œนแตข:
๐’‚แตข + ๐œนแตข := โŒˆ(๐’—แตข + ๐’†แตข)
        ~ โŒˆ๐’—แตข + ๐’†แตข * ๐“‘โŒˆ๐’—แตข
        ~ ๐’‚แตข + ๐’†แตข * ๐“‘โŒˆ๐’—แตข
๐œนแตข ~ ๐’†แตข * ๐“‘โŒˆ๐’—แตข
   ~ ๐’†แตข * ๐“‘๐’‚แตข
๐œนแตข ~ ๐’†แตข * (1-๐’‚แตข) * ๐’‚แตข
## Expand first level and solve for ๐’†โ‚•:
๐’—โ‚• + ๐’†โ‚• := (๐’ƒโ‚• + ๐œบโ‚•) + โˆ‘แตข((๐‘พโ‚•แตข + ๐œบแตข) * (๐’‚แตข + ๐œนแตข))
        = ๐’ƒโ‚•+๐œ€ + (๐‘พโ‚•โฑ+๐œบโฑ)(๐’‚แตข+๐œนแตข)
        = ๐’ƒโ‚•+๐œ€ + ๐‘พโ‚•โฑ๐’‚แตข + ๐œบโฑ๐’‚แตข + ๐‘พโ‚•โฑ๐œนแตข + ๐œบโฑ๐œนแตข
        ~ ๐’ƒโ‚•+๐œ€ + ๐‘พโ‚•โฑ๐’‚แตข + ๐œบโฑ๐’‚แตข + ๐‘พโ‚•โฑ๐œนแตข # ๐œบ๐œน vanishes
        ~ ๐’ƒโ‚•+๐‘พโ‚•โฑ๐’‚แตข + ๐œ€+๐œบโฑ๐’‚แตข + ๐‘พโ‚•โฑ๐œนแตข
        ~ ๐’—โ‚• + ๐œ€+๐œบโฑ๐’‚แตข + ๐‘พโ‚•โฑ๐œนแตข
๐’†โ‚• ~ ๐œ€+๐œบโฑ๐’‚แตข + ๐‘พโ‚•โฑ๐œนแตข
   ~ ๐œ€(1+โˆ‘๐’‚แตข) + ๐‘พโ‚•โฑ๐œนแตข
   ~ ๐œ€๐โ‚• + ๐‘พโ‚•โฑ๐œนแตข
## Substitute out ๐œนแตข:
๐’†โ‚• ~ ๐œ€๐โ‚• + ๐‘พโ‚•โฑ๐œนแตข # ๐’†=๐œ€๐+๐‘พ๐œน'
   ~ ๐œ€๐โ‚• + ๐‘พโ‚•โฑ๐’†แตข๐“‘๐’‚แตข
   ~ ๐œ€๐โ‚• + ๐‘พโ‚•โฑ๐“‘๐’‚แตข๐’†แตข
## Substitute out ๐’†แตข:
๐’†โ‚• ~ ๐œ€๐โ‚• + ๐‘พโ‚•โฑ๐“‘๐’‚แตข๐’†แตข
   ~ ๐œ€๐โ‚• + ๐‘พโ‚•โฑ๐“‘๐’‚แตข(๐œ€๐แตข + ๐‘พแตขสฒ๐œนโฑผ) # ๐’†~๐œ€๐+๐‘พ๐œน'
   ~ ๐œ€๐โ‚• + ๐‘พโ‚•โฑ๐“‘๐’‚แตข๐œ€๐แตข + ๐‘พโ‚•โฑ๐“‘๐’‚แตข๐‘พแตขสฒ๐œนโฑผ
   ~ ๐œ€๐โ‚• + ๐œ€๐‘พโ‚•โฑ๐“‘๐’‚แตข๐แตข + ๐‘พโ‚•โฑ๐“‘๐’‚แตข๐‘พแตขสฒ๐œนโฑผ # reorder
   ~ ๐œ€๐โ‚• + ๐œ€๐œงโ‚•โฑ๐แตข + ๐œงโ‚•โฑ๐‘พแตขสฒ๐œนโฑผ # ๐œง =๐‘พ๐“‘๐’‚'
๐’†โ‚• ~ ๐œ€(๐โ‚• + ๐œงโ‚•โฑ๐แตข) + ๐œงโ‚•โฑ๐‘พแตขสฒ๐œนโฑผ # Level 2 plus an additional term due to ๐œนโฑผ
# Recall that in level 2, ๐œนโฑผ was zero, but level three continues...
๐’†โ‚• ~ ๐œ€(๐โ‚• + ๐œงโ‚•โฑ๐แตข) + ๐œงโ‚•โฑ๐‘พแตขสฒ๐œนโฑผ
   ~ ๐œ€(๐โ‚• + ๐œงโ‚•โฑ๐แตข) + ๐œงโ‚•โฑ๐‘พแตขสฒ๐“‘๐’‚โฑผ๐’†โฑผ # ๐œน~๐“‘๐’‚๐’†
   ~ ๐œ€(๐โ‚• + ๐œงโ‚•โฑ๐แตข) + ๐œงโ‚•โฑ๐œงแตขสฒ๐’†โฑผ
   ~ ๐œ€(๐โ‚• + ๐œงโ‚•โฑ๐แตข) + ๐œงโ‚•โฑ๐œงแตขสฒ(๐œ€๐โฑผ+๐‘พโฑผแต๐œนโ‚–) # ๐’†~๐œ€๐+๐‘พ๐œน'
   ~ ๐œ€(๐โ‚• + ๐œงโ‚•โฑ๐แตข) + ๐œ€๐œงโ‚•โฑ๐œงแตขสฒ๐โฑผ + ๐œงโ‚•โฑ๐œงแตขสฒ๐‘พโฑผแต๐œนโ‚–
   ~ ๐œ€(๐โ‚• + ๐œงโ‚•โฑ๐แตข + ๐œงโ‚•โฑ๐œงแตขสฒ๐โฑผ) + ๐œงโ‚•โฑ๐œงแตขสฒ๐‘พโฑผแต๐œนโ‚–
# For level three, ๐œนโ‚– is zero:
๐’†โ‚• ~ ๐œ€(๐โ‚• + ๐œงโ‚•โฑ๐แตข + ๐œงโ‚•โฑ๐œงแตขสฒ๐โฑผ)

General propagation of errors

# The above establishes a clear pattern:
๐’†โ‚• ~ ๐œ€(๐โ‚• + ๐œงโ‚•โฑ๐แตข + ๐œงโ‚•โฑ๐œงแตขสฒ๐โฑผ + ๐œงโ‚•โฑ๐œงแตขสฒ๐œงโฑผแต๐โ‚– + ...)
๐’† ~ ๐œ€(๐ + ๐œง ๐' + ๐œง ๐œง'๐" + ๐œง ๐œง'๐œง"๐"' + ...)
# Error bound estimate:
0 < ๐’‚ < 1
0 < ๐“‘๐’‚=(1-๐’‚)๐’‚ < 0.25 = ยผ
|๐“‘๐’‚| ~ ยผ
|๐’‚| ~ ยฝ
|๐| ~ 1+โˆ‘|๐’‚'|
    ~ 1+โˆ‘ยฝ
    ~ 1+ยฝ๐‘ := ๐”ช
|โˆ‘๐‘พ| ~ โˆš๐‘ # random walk
|๐œง| ~ |๐‘พ||๐“‘๐’‚|
    ~ ยผโˆš๐‘
|๐’†| ~ |๐œ€|(|๐| + |๐œง ๐'| + |๐œง ๐œง'๐"| + |๐œง ๐œง'๐œง"๐"'| + ...)
    ~ |๐œ€|(๐”ช + |๐œง |๐”ช + |๐œง ๐œง'|๐”ช + |๐œง ๐œง'๐œง"'|๐”ช + ...)
    ~ |๐œ€|๐”ช(1 + |๐œง| + |๐œง|ยฒ + |๐œง|ยณ + ...)
# Consider very large ๐‘ on each level in an ๐‘›+2 layer network:
|๐’†| ~ |๐œ€|ยฝ๐‘(ยผโˆš๐‘)โฟ
# For a 3 layer network(input, middle, and output layers), ๐‘›=1:
|๐’†| ~ |๐œ€|๐”ช(1 + |๐œง|)
    ~ |๐œ€|๐‘โˆš๐‘ / 8 # ๐‘>>1, large ๐‘
|๐œ€| ~ 8|๐’†| / ๐‘โˆš๐‘ # ๐‘>>1

Legacy

# In trying to find the recursion pattern, I came across several interesting
# expressions.  I define them all here, including the ones actually used above:
๐“‘๐’‚ := ๐’‚(1-๐’‚)
๐’‚ := โŒˆ๐’—
๐’— := ๐’ƒ + ๐‘พ ๐’‚'
๐’‚ = โŒˆ ๐’ƒ+๐‘พ๐’‚'
๐’‚+๐œน := โŒˆ(๐’—+๐’†)
๐’— = โŒ‹๐’‚
๐’—+๐’† := ๐’ƒ+๐œบ + (๐‘พ+๐œบ)(๐’‚'+๐œน')
๐ := 1+โˆ‘๐’‚'
๐œง ๐' := ๐‘พ ๐“‘๐’‚'๐'
# Legacy:
๐€ := ๐“‘๐’‚ ๐
๐œฟ := ๐œง ๐' = ๐‘พ ๐“‘๐’‚'๐' = ๐‘พ ๐€'
๐œพ := ๐œง ๐œฟ' = ๐œง ๐œง'๐"