## Part 2: Neural Networks


### Neurons
A neuron is just a container for weights, a bias, and an activation function. It takes a number of inputs, combines them, and produces a scalar output. The number of weights is determined by the number of inputs (features) to that neuron. The activation function allows for more complex mapping of inputs to the output (and so that layers of neurons don't reduce to a single linear combination). The code defining a neuron is here:

```julia
    mutable struct Neuron{T<:AbstractFloat,F<:Union{Function,Nothing}}
        w::Vector{Value{T}}
        b::Value{T}
        activation::F
    end
```
We can define a function to make variables of this type "callable": `n(X)`

```julia
    function (n::Neuron)(x)
        if length(x) != length(n.w)
            error("In calling n(x), expected $(length(n.w)) inputs to neuron, got $(length(x)):\n\tx = $(x)")
        end
        raw = sum(n.w .* x)+ n.b
        isnothing(n.activation) && return raw 
        return n.activation(raw)
    end
```
This just computes the inner product of X and the weights, adds b, and then runs the result through an activation function.

In [1]:
using Micrograd

## make a linear neuron with three inputs and call it
n = neuron(3,nothing)
x = [1.0,2.0,1.0]
o = n(x)
println(o)

-1.2 (gr: 0.0, op: +)


In [2]:
# now let's look at the tree
nodes,depth = buildgraph(o)
printgraph(nodes,depth)


Tree:
----- 

-1.2 (gr: 0.0, op: +)   
|------------------------|
-0.84 (gr: 0.0, op:  )   -0.39 (gr: 0.0, op: +)   
                         |----------------------------------------------|
                         0.02 (gr: 0.0, op: *)                          -0.41 (gr: 0.0, op: +)   
                         |----------------------|                       |-----------------------------------------------|
                         1.0 (gr: 0.0, op:  )   0.02 (gr: 0.0, op:  )   -0.98 (gr: 0.0, op: *)                          0.58 (gr: 0.0, op: *)   
                                                                        |----------------------|                        |----------------------|
                                                                        2.0 (gr: 0.0, op:  )   -0.49 (gr: 0.0, op:  )   1.0 (gr: 0.0, op:  )   0.58 (gr: 0.0, op:  )   

In [3]:
# you can't give the wrong number of inputs
try
    n([1.0,2.0])
catch e
    println(e)
    println("\n ^ correctly throws error")
end

ErrorException("In calling n(x), expected 3 inputs to neuron, got 2:\n\tx = [1.0, 2.0]")

 ^ correctly throws error


In [4]:
## make a relu activation neuron with three inputs and call it
# note the op here is now relu instead of +
# run this multiple times to see relu effect on a negative value
n = neuron(3,relu)
x = [1.0,2.0,1.0]
o = n(x) 

nodes,depth = buildgraph(o)
printgraph(nodes,depth)



Tree:
----- 

1.0 (gr: 0.0, op: relu)   
|
1.0 (gr: 0.0, op: +)   
|----------------------|
0.9 (gr: 0.0, op:  )   0.11 (gr: 0.0, op: +)   
                       |------------------------------------------------|
                       -0.028 (gr: 0.0, op: *)                          0.14 (gr: 0.0, op: +)   
                       |----------------------|                         |-----------------------------------------------|
                       1.0 (gr: 0.0, op:  )   -0.028 (gr: 0.0, op:  )   -0.43 (gr: 0.0, op: *)                          0.57 (gr: 0.0, op: *)   
                                                                        |----------------------|                        |----------------------|
                                                                        2.0 (gr: 0.0, op:  )   -0.22 (gr: 0.0, op:  )   1.0 (gr: 0.0, op:  )   0.57 (gr: 0.0, op:  )   

### Layers
Layers are just groups of neurons, all with the same inputs (or at least here with fully connected layers). They essentially map from N dimensional input to M dimensional outputs where N is the number of inputs to each neuron in the layer and M is the number of neurons in the layer (recall each neuron has one output).

The code is here:
```julia
mutable struct Layer{T<:AbstractFloat,F<:Union{Function,Nothing}}
    neurons::Vector{Neuron{T,F}}
    inputs::Int
    outputs::Int
end
```

and again we make the layer callable:

```julia
function (l::Layer)(x)
    out = [n(x) for n in l.neurons]
    length(out) == 1 && return out[1]
    return out
end
```

In [5]:
## make a layer of a single neuron with relu activation
# this is just a layer wrapper for one neuron to make sure the API works and not break the printing
l = layer(3,1,relu)
o = l(x)

nodes,depth = buildgraph(o)
printgraph(nodes,depth)


Tree:
----- 

2.8 (gr: 0.0, op: relu)   
|
2.8 (gr: 0.0, op: +)   
|-----------------------|
0.91 (gr: 0.0, op:  )   1.9 (gr: 0.0, op: +)   
                        |----------------------------------------------|
                        0.64 (gr: 0.0, op: *)                          1.3 (gr: 0.0, op: +)   
                        |----------------------|                       |----------------------------------------------|
                        1.0 (gr: 0.0, op:  )   0.64 (gr: 0.0, op:  )   1.3 (gr: 0.0, op: *)                           -0.074 (gr: 0.0, op: *)   
                                                                       |----------------------|                       |----------------------|
                                                                       2.0 (gr: 0.0, op:  )   0.67 (gr: 0.0, op:  )   1.0 (gr: 0.0, op:  )   -0.074 (gr: 0.0, op:  )   

### Multilayer Perceptrons

These are simply lists of layers where the output of one is the input to the next. This means the n_in and n_out have to line up in the series. 

Here's the simple definition of the MLP type:
```julia
mutable struct MLP
    layers::Vector{Layer}
end
```

The constructor takes how many inputs (from the features in the data set) and a list of how many neurons per layer and makes sure the ins and outs line up.

```julia
function mlp(n_in,n_outs,act=relu)
    n_vec = [n_in;n_outs]
    MLP([i!= length(n_outs) ? layer(n_vec[i],n_vec[i+1],act) : layer(n_vec[i],n_vec[i+1],nothing) for i in 1:length(n_outs)])
end
```



At the end, we need to make sure the output maps to the correct domain for our problem. That's why the last one has a linear output. We don't want to truncate the domain with an activation function by default.


### Fitting a NN

Instead of showing a simple MLP which is impossible to print with the tools in this package, we'll define the same one used in Micrograd to classify the sklearn moons data.

This is an MLP that takes in 2D points and predicts a binary class. It uses 3 layers of 16 (relu), 16 (relu), and 1 (linear). The ending linear layer makes sure we can actually calculate hinge loss. Try setting it to relu and notice it won't fit.

In [6]:
X,y = getmoons()
m = mlp(2, [16, 16, 1])


fit(m,X,y)

Step: 0	Loss: 2.5	Accuracy: 0.34


Step: 1	Loss: 4.3	Accuracy: 0.5


Step: 2	Loss: 1.1	Accuracy: 0.5


Step: 3	Loss: 1.3	Accuracy: 0.5


Step: 4	Loss: 0.88	Accuracy: 0.5


Step: 5	Loss: 0.59	Accuracy: 0.83


Step: 6	Loss: 0.41	Accuracy: 0.86


Step: 7	Loss: 0.27	Accuracy: 0.89


Step: 8	Loss: 0.24	Accuracy: 0.91


Step: 9	Loss: 0.25	Accuracy: 0.89


Step: 10	Loss: 0.25	Accuracy: 0.9


Step: 11	Loss: 0.31	Accuracy: 0.88


Step: 12	Loss: 0.21	Accuracy: 0.93


Step: 13	Loss: 0.23	Accuracy: 0.9


Step: 14	Loss: 0.17	Accuracy: 0.93


Step: 15	Loss: 0.17	Accuracy: 0.94


Step: 16	Loss: 0.16	Accuracy: 0.94


Step: 17	Loss: 0.2	Accuracy: 0.92


Step: 18	Loss: 0.12	Accuracy: 0.97


Step: 19	Loss: 0.13	Accuracy: 0.95


Step: 20	Loss: 0.13	Accuracy: 0.96


Step: 21	Loss: 0.19	Accuracy: 0.93


Step: 22	Loss: 0.2	Accuracy: 0.94


Step: 23	Loss: 0.17	Accuracy: 0.94


Step: 24	Loss: 0.1	Accuracy: 0.96


Step: 25	Loss: 0.097	Accuracy: 0.96


Step: 26	Loss: 0.14	Accuracy: 0.95


Step: 27	Loss: 0.21	Accuracy: 0.92


Step: 28	Loss: 0.13	Accuracy: 0.94


Step: 29	Loss: 0.072	Accuracy: 0.98


Step: 30	Loss: 0.071	Accuracy: 0.97


Step: 31	Loss: 0.11	Accuracy: 0.96


Step: 32	Loss: 0.18	Accuracy: 0.94


Step: 33	Loss: 0.16	Accuracy: 0.93


Step: 34	Loss: 0.058	Accuracy: 0.98


Step: 35	Loss: 0.052	Accuracy: 0.99


Step: 36	Loss: 0.074	Accuracy: 1.0


Step: 37	Loss: 0.083	Accuracy: 0.96


Step: 38	Loss: 0.044	Accuracy: 1.0


Step: 39	Loss: 0.046	Accuracy: 0.98


Step: 40	Loss: 0.056	Accuracy: 1.0


Step: 41	Loss: 0.11	Accuracy: 0.96


Step: 42	Loss: 0.053	Accuracy: 0.99


Step: 43	Loss: 0.047	Accuracy: 1.0


Step: 44	Loss: 0.069	Accuracy: 0.97


Step: 45	Loss: 0.03	Accuracy: 1.0


Step: 46	Loss: 0.033	Accuracy: 1.0


Step: 47	Loss: 0.042	Accuracy: 1.0


Step: 48	Loss: 0.072	Accuracy: 0.97


Step: 49	Loss: 0.029	Accuracy: 1.0


Step: 50	Loss: 0.033	Accuracy: 1.0


Step: 51	Loss: 0.055	Accuracy: 0.98


Step: 52	Loss: 0.023	Accuracy: 1.0


Step: 53	Loss: 0.024	Accuracy: 1.0


Step: 54	Loss: 0.035	Accuracy: 1.0


Step: 55	Loss: 0.065	Accuracy: 0.97


Step: 56	Loss: 0.026	Accuracy: 1.0


Step: 57	Loss: 0.034	Accuracy: 1.0


Step: 58	Loss: 0.059	Accuracy: 0.97


Step: 59	Loss: 0.025	Accuracy: 1.0


Step: 60	Loss: 0.029	Accuracy: 1.0


Step: 61	Loss: 0.053	Accuracy: 0.98


Step: 62	Loss: 0.027	Accuracy: 1.0


Step: 63	Loss: 0.021	Accuracy: 1.0


Step: 64	Loss: 0.022	Accuracy: 1.0


Step: 65	Loss: 0.019	Accuracy: 1.0


Step: 66	Loss: 0.021	Accuracy: 1.0


Step: 67	Loss: 0.019	Accuracy: 1.0


Step: 68	Loss: 0.02	Accuracy: 1.0


Step: 69	Loss: 0.021	Accuracy: 1.0


Step: 70	Loss: 0.019	Accuracy: 1.0


Step: 71	Loss: 0.02	Accuracy: 1.0


Step: 72	Loss: 0.018	Accuracy: 1.0


Step: 73	Loss: 0.019	Accuracy: 1.0


Step: 74	Loss: 0.019	Accuracy: 1.0


Step: 75	Loss: 0.019	Accuracy: 1.0


Step: 76	Loss: 0.018	Accuracy: 1.0


Step: 77	Loss: 0.018	Accuracy: 1.0


Step: 78	Loss: 0.018	Accuracy: 1.0


Step: 79	Loss: 0.018	Accuracy: 1.0


Step: 80	Loss: 0.018	Accuracy: 1.0


Step: 81	Loss: 0.018	Accuracy: 1.0


Step: 82	Loss: 0.018	Accuracy: 1.0


Step: 83	Loss: 0.018	Accuracy: 1.0


Step: 84	Loss: 0.018	Accuracy: 1.0


Step: 85	Loss: 0.018	Accuracy: 1.0


Step: 86	Loss: 0.018	Accuracy: 1.0


Step: 87	Loss: 0.018	Accuracy: 1.0


Step: 88	Loss: 0.018	Accuracy: 1.0


Step: 89	Loss: 0.017	Accuracy: 1.0


Step: 90	Loss: 0.017	Accuracy: 1.0


Step: 91	Loss: 0.017	Accuracy: 1.0


Step: 92	Loss: 0.017	Accuracy: 1.0


Step: 93	Loss: 0.017	Accuracy: 1.0


Step: 94	Loss: 0.017	Accuracy: 1.0


Step: 95	Loss: 0.017	Accuracy: 1.0


Step: 96	Loss: 0.017	Accuracy: 1.0


Step: 97	Loss: 0.017	Accuracy: 1.0


Step: 98	Loss: 0.017	Accuracy: 1.0


Step: 99	Loss: 0.017	Accuracy: 1.0


In [7]:
# ok there's no plotting library. this is not quite as satisfying.
y_fit = m.(X)
cat(y,sign.(getfield.(y_fit,:data)),dims=2)

100×2 Matrix{Float64}:
 -1.0  -1.0
 -1.0  -1.0
  1.0   1.0
 -1.0  -1.0
  1.0   1.0
  1.0   1.0
  1.0   1.0
  1.0   1.0
  1.0   1.0
 -1.0  -1.0
  ⋮    
  1.0   1.0
  1.0   1.0
  1.0   1.0
  1.0   1.0
  1.0   1.0
 -1.0  -1.0
  1.0   1.0
  1.0   1.0
 -1.0  -1.0

In [8]:
# but maybe the decision boundary is cooler
xg = collect(range(-2.0,stop=2.0,length=25))
yg = collect(range(2.0,stop=-2.0,length=25))

yi_old = 0
str = ""
for (yi,y) in enumerate(collect(yg))
    for (xi,x) in enumerate(collect(xg))
        if yi!=yi_old
            str = str*"\n"
        end
        o = m([x,y])
        if sign(o.data) == 1
            str=str*"+  "
        else
            str=str*"-  "
        end
        yi_old = yi
    end
end
print(str)


-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  
-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  
-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  
-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  
-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  
-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  
-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  
-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  
-  -  -  -  -  -  -  -  -  -  -  -  +  -  -  -  -  -  -  -  -  -  -  -  +  
-  -  -  -  -  -  -  -  -  -  -  +  +  +  +  -  -  -  -  -  -  -  -  +  +  
-  -  -  -  -  -  -  -  -  -  +  +  +  +  +  +  -  -  -  -  -  -  +  +  +  
-  -  -  -  -  -  -  -  -  +  +  +  +  +  +  +  +  -  -  -  -  +  +  +  +  
-  -  -  -  -  -  -  -  -  +  +  +  +  +  +  +  +  +  -  -  -  +  +  +  +  
-  -  -  - 

In [10]:
# let's give it some color!
xg = collect(range(-2.0,stop=2.0,length=25))
yg = collect(range(2.0,stop=-2.0,length=25))

yi_old = 0
str = ""
io = IOBuffer();
for (yi,y) in enumerate(collect(yg))
    for (xi,x) in enumerate(collect(xg))
        if yi!=yi_old
            print("\n")
        end
        o = m([x,y])
        if sign(o.data) == 1
            printstyled("+  ",color=:blue)
        else
            printstyled("-  ",color=:red)
        end
        yi_old = yi
    end
end
print(str)


[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m
[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m
[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m[31m-  [39m
[31m-  [39m[31m-  