# Grid of Resistors II
## Compute the effective resistance of a 2n+1 by 2n+2 grid of resistors
## Using SOR with red-black ordering


In [11]:
using BenchmarkTools

Consider again the reference implementation of the resistance computation.

In [12]:
function compute_resistance(n, nreps = 100)
    # Original MATLAB version, Alan Edelman, January 1994
    # Julia translations, Andreas Noack, June 2018

    # assume n and omega already defined or take
    # the following values for the optimal omega
    μ = (cos(π/(2*n)) + cos(π/(2*n + 1)))/2
    ω = 2*(1 - sqrt(1 - μ^2))/μ^2
    # (See page 409 of Strang Intro to Applied Math , this is equation 16)

    # Initialize voltages
    v = zeros(Float32, 2*n + 1, 2*n + 2)

    # Define Input Currents
    b = copy(v)
    b[n + 1, (n + 1):(n + 2)]  = [1 -1]

    # Makes indices easy to read
    ie = 2:2:(2*n)      # even i's
    io = 3:2:(2*n - 1)  # odd i's
    je = 2:2:(2*n)      # even j's
    jo = 3:2:(2*n + 1)  # odd j's

    # Jacobi Steps
    for k in 1:nreps
        v[ie, je] = (1 - ω) * v[ie,je] +
                      ω*(v[ie + 1, je] + v[ie - 1, je] + v[ie, je + 1] + v[ie, je - 1] + b[ie, je])/4
        v[io, jo] = (1 - ω) * v[io, jo] +
                      ω*(v[io + 1, jo] + v[io - 1, jo] + v[io, jo + 1] + v[io, jo - 1] + b[io, jo])/4
        v[ie, jo] = (1 - ω) * v[ie, jo] +
                      ω*(v[ie + 1, jo] + v[ie - 1, jo] + v[ie, jo + 1] + v[ie, jo - 1] + b[ie, jo])/4
        v[io, je] = (1 - ω) * v[io, je] +
                      ω*(v[io + 1, je] + v[io - 1, je] + v[io, je + 1] + v[io, je - 1] + b[io, je])/4
    end
    # Compute resistance = v_A - v_b = 2 v_A
    r = 2*v[n + 1, n + 1]
    return v, r
end


compute_resistance (generic function with 2 methods)

In the previous exercise, we saw how the number of allocations could be reduced by using the dot syntax for broadcasting. However, the function still allocated a lot so the aim of this exercise is to reduce the number of allocations even further.

A the array "slicing" is what is causing the many allocations since slicing will create copies. A typical optimization in hot Julia code is to *devectorize* the code since that will will avoid allocations.

**Exercise** As a first step in the process of optimizing, implement af function with the signature
```julia
stencil(v::Matrix, b::Matrix, ω::Number, i::Int, j::Int)
```

In [22]:
stencil(v::Matrix, b::Matrix, ω::Number, i::Int, j::Int) = 
    (1 - ω) * v[i,j] +
        ω*(v[i + 1, j] + v[i - 1, j] + v[i, j + 1] + v[i, j - 1] + b[i, j])/4

stencil (generic function with 1 method)

**Exercise** Now use this function to rewrite a devectorized version of `compute_resistance` and call it `compute_resistance_devec`. The trick is to use a double `for` loop over the ranges `ie`, `io`, `je`, and `jo` as appropriate. Make sure that the function produces the correct result and time it. This version should have seven allocations.

In [30]:
function compute_resistance_devec(n, reps = 100)
    # assume n and omega already defined or take
    # the following values for the optimal omega
    μ = (cos(π/(2*n)) + cos(π/(2*n + 1)))/2
    ω = 2*(1 - sqrt(1 - μ^2))/μ^2
    # (See page 409 of Strang Intro to Applied Math , this is equation 16)

    # Initialize voltages
    v = zeros(Float32, 2*n + 1, 2*n + 2)

    # Define Input Currents
    b = copy(v)
    b[n + 1, (n + 1):(n + 2)]  = [1 -1]

    # Makes indices easy to read
    ie = 2:2:(2*n)      # even i's
    io = 3:2:(2*n - 1)  # odd i's
    je = 2:2:(2*n)      # even j's
    jo = 3:2:(2*n + 1)  # odd j's

    # Jacobi Steps
    for k in 1:reps
        for j in je, i in ie
            v[i,j] = stencil(v, b, ω, i, j)
        end

        for j in jo, i in io
            v[i,j] = stencil(v, b, ω, i, j)
        end
        
        for j in jo, i in ie
            v[i,j] = stencil(v, b, ω, i, j)
        end
        
        for j in je, i in io
            v[i,j] = stencil(v, b, ω, i, j)
        end
    end
# Compute resistance = v_A - v_b = 2 v_A
    r = 2*v[n + 1, n + 1]
    return v, r
end

compute_resistance_devec (generic function with 2 methods)

In [24]:
@btime compute_resistance_devec(400);

  447.523 ms (7 allocations: 9.80 MiB)


**Exercise** Look at the output from `@code_lowered compute_resistance_devec(400, 100)` and identify the calls to `stencil`.

**Exercise** ook at the output from `@code_typed compute_resistance_devec(400, 100)` and verify that there are no calls to `stencil`.

Julia has *inlined* the function call to avoid the overhead of the function call and to allow compiler optimizations of the whole loop body.

**Exercise** The function iterates over the arrays four times. That is more than necessary. Identify the any dependencies between the four double loops and reduce it to two double loops. Time the result.

In [25]:
@code_typed compute_resistance_devec(400, 100)

CodeInfo(:(begin 
        NewvarNode(:(r))
        SSAValue(26) = (Base.div_float)(3.141592653589793, (Base.sitofp)(Float64, (Base.mul_int)(2, n)::Int64)::Float64)::Float64
        $(Expr(:inbounds, false))
        # meta: location math.jl cos 419
        SSAValue(28) = $(Expr(:foreigncall, ("cos", "libopenlibm"), Float64, svec(Float64), SSAValue(26), 0))
        # meta: location math.jl nan_dom_err 300
        unless (Base.and_int)((Base.ne_float)(SSAValue(28), SSAValue(28))::Bool, (Base.not_int)((Base.ne_float)(SSAValue(26), SSAValue(26))::Bool)::Bool)::Bool goto 10
        #temp#@_32 = (Base.Math.throw)($(QuoteNode(DomainError())))::Union{}
        goto 12
        10: 
        #temp#@_32 = SSAValue(28)
        12: 
        # meta: pop location
        # meta: pop location
        $(Expr(:inbounds, :pop))
        SSAValue(23) = (Base.div_float)(3.141592653589793, (Base.sitofp)(Float64, (Base.add_int)((Base.mul_int)(2, n)::Int64, 1)::Int64)::Float64)::Float64
        $(Expr(:inbounds,

In [None]:
function compute_resistance_devec2(n, reps = 100)
    # assume n and omega already defined or take
    # the following values for the optimal omega
    μ = (cos(π/(2*n)) + cos(π/(2*n + 1)))/2
    ω = 2*(1 - sqrt(1 - μ^2))/μ^2
    # (See page 409 of Strang Intro to Applied Math , this is equation 16)

    # Initialize voltages
    v = zeros(Float32, 2*n + 1, 2*n + 2)

    # Define Input Currents
    b = copy(v)
    b[n + 1, (n + 1):(n + 2)]  = [1 -1]

    # Jacobi Steps
    for k in 1:reps
        
        # even-even and odd-odd
        for j in 1:n
            for i in 1:(n - 1)
                v[2*i, 2*j]         = stencil(v, b, ω, 2*i    , 2*j)
                v[2*i + 1, 2*j + 1] = stencil(v, b, ω, 2*i + 1, 2*j + 1)
            end
            v[2*n, 2*j] = stencil(v, b, ω, 2*n, 2*j)
        end
                
        # even-odd and odd-even
        for j in 1:n
            for i in 1:(n - 1)
                v[2*i, 2*j + 1] = stencil(v, b, ω, 2*i    , 2*j + 1)
                v[2*i + 1, 2*j] = stencil(v, b, ω, 2*i + 1, 2*j)
            end
            v[2*n, 2*j + 1] = stencil(v, b, ω, 2*n, 2*j + 1)
        end
    end
# Compute resistance = v_A - v_b = 2 v_A
    r = 2*v[n + 1, n + 1]
    return v, r
end

In [None]:
@btime compute_resistance_devec2(400);

In [31]:
v1, r1 = compute_resistance(400);
v2, r2 = compute_resistance_devec(400);

In [32]:
r1 - r2

0.0f0

In [28]:
r1

0.5002341f0

In [29]:
r2

0.5002340652149473