# User Code is Fast

**In Julia, user code and user types can be just as fast as built-in or library code.**

If you know what you're doing, you can typically match the performance of optimized C/Fortran code.

In the following, we illustrate this with two very basic examples.

## User code: Summation

In [None]:
using BenchmarkTools

In [None]:
x = rand(10^7); # some numbers to be summed up

### C

In [None]:
# What this does: compile simple C sum function into a shared library by piping C code into gcc

c_code = """
#include <stddef.h>
double c_sum(size_t n, double *X) {
    double s = 0.0;
    for (size_t i = 0; i < n; ++i) {
        s += X[i];
    }
    return s;
}
""";

using Libdl
Clib = tempname() * "." * Libdl.dlext

open(`gcc -fPIC -O3 -msse3 -xc -shared -o $Clib -`, "w") do f
    print(f, c_code)
end

In [None]:
# Readily call the function `c_sum` in the shared library

c_sum(X::Array{Float64}) = @ccall Clib.c_sum(length(X)::Csize_t, X::Ptr{Float64})::Float64

In [None]:
c_sum(x)

In [None]:
c_sum(x) â‰ˆ sum(x)

### Julia

In [None]:
function jl_sum(A)
    s = zero(eltype(A))
    for a in A
        s += a
    end
    return s
end

In [None]:
@btime c_sum($x) samples = 100 evals = 10;
@btime jl_sum($x) samples = 100 evals = 10;

## User type: Diagonal matrix

Let's create a simple custom `DiagonalMatrix` type that can represent square diagonal matrices, i.e.

$$ D = \left( \begin{matrix} x & 0 & 0 & 0 \\ 0 & y & 0 & 0 \\ 0 & 0 & z & 0 \\ 0 & 0 & 0 & \ddots \end{matrix} \right) $$

In [None]:
struct DiagonalMatrix{T} <: AbstractArray{T,2}
    diag::Vector{T}
end

We integrate our `DiagonalMatrix` into Julia's type hierarchy by making it a subtype of `AbstractMatrix`. To actually make it behave like a matrix (a two-dimensional array) we implement (parts of) the [`AbstractArray` interface](https://docs.julialang.org/en/v1/manual/interfaces/#man-interface-array-1).

In [None]:
import Base: getindex, setindex!, size

function getindex(D::DiagonalMatrix, i::Int, j::Int)
    if i == j
        return D.diag[i]
    else
        return zero(eltype(D))
    end
end

function setindex!(D::DiagonalMatrix, v, i::Int, j::Int)
    if i == j
        D.diag[i] = v
    else
        throw(ArgumentError("cannot set off-diagonal entry ($i, $j)"))
    end
    return v
end

function size(D::DiagonalMatrix)
    n = length(D.diag)
    return (n, n)
end

In [None]:
D = DiagonalMatrix([1,2,3])

Note how it's automagically pretty printed (despite the fact that we never defined any special printing)!

But that's not it. All kinds of different functions now "just work".

In [None]:
D + D # addition

In [None]:
D * D # multiplication

In [None]:
inv(D) # inversion

In [None]:
using LinearAlgebra
eigen(D) # eigensolver

Why does all of this work?


Because `DiagonalMatrix <: AbstractArray` and it hence shares generic fallback methods (that only rely on `getindex`, `setindex`, and `size`) with other arrays.

Of course, so far, these operations have suboptimal performance because they don't utilize the special structure inherent to our `DiagonalMatrix`. Let's implement a more efficient addition for our diagonal matrix type.

In [None]:
import Base: +

+(Da::DiagonalMatrix, Db::DiagonalMatrix) = DiagonalMatrix(Da.diag + Db.diag)

Let's compare our very rudamentary `DiagonalMatrix` against the standard `Diagonal` type that ships in the `LinearAlgebra` standard library.

In [None]:
using BenchmarkTools
using LinearAlgebra

x = rand(1000);
Djl = Diagonal(x)
D = DiagonalMatrix(x)

@btime $Djl + $Djl;
@btime $D + $D;

There is nothing special about built-in types. In fact, [they are implemented in essentially the same way](https://github.com/JuliaLang/julia/blob/master/stdlib/LinearAlgebra/src/diagonal.jl#L5)!