# Simple network to minimize CRPS

**Steps (to be updated)**
1. Set up network structure (first here, then probably in separate module)
2. Try feeding some data to the model and sanity-check the output
3. Run the network on the full dataset.

The EMOS analog is a simple network like this:

![title](EMOS_network.png)

I will try to build a basic model like this in Keras, since this is the simplest library. There are two complications: 

1. This is not a fully connected layer, so we have to find a workaround, but I think something like this should work: https://github.com/fchollet/keras/issues/3919
2. We need to write a custom CRPS loss function. We should probably use theano as a backend to make it compatible with Kai P.'s code. Example for keras loss functions: https://keras.io/losses/ or here: https://github.com/fchollet/keras/issues/369

In [1]:
import theano
import numpy as np

So why do we have to write that function extra. They are hooks for interacting with the graph... aha.



In [30]:
# Ok here we go... http://www.marekrei.com/blog/theano-tutorial/

# Let's create variables for the input
meanx = theano.tensor.fscalar('meanx')   # This is for 32 bit floats

# Now the weights and biases a and b
a = theano.shared(np.asarray(0.5), 'a')  # The explisit names help with debugging
b = theano.shared(np.asarray(2.), 'b')  # Make sure this is a float!!!

# Let's define mu, this set's up the graph
mu = meanx * b + a

# And a function that takes input and returns output
# What happens without the brackets? Error: Input variables of a Theano function should be contained in a list, even when there is a single input.
f = theano.function([meanx], mu)

# Let's evaluate the output
# Can I leave out the brackets here?
# Nope: Wrong number of dimensions: expected 1, got 0 with shape ().
out = f(1)

# So the input to f can either be a 1D numpy array with float32 or a list

In [31]:
out

array(2.5)

In [21]:
# Let' now actaully train.

# Does my graph still exist?
f(np.asarray([1, 2], dtype='float32'))

array([ 2.5,  4.5])

In [32]:
# Yes. 

# Define a target
target = theano.tensor.fscalar('target')

# Define a loss/cost function, simple squared distance
cost = theano.tensor.sqr(target - mu)
# This needs to be a scalar, so we need to define the input x as a scalar
# For now, I guess later we can define both x and target as vectors.

# Compute partial derivatives of cost function with respect to weights
gradients = theano.tensor.grad(cost, [a, b])

In [38]:
# Now we create updated variables
a_updated = a - (0.01 * gradients[0])
b_updated = b - (0.01 * gradients[1])

# And write a function that replaces the pld variable with the updated value
updates = [(a, a_updated), (b, b_updated)]

# Now lets define a function again to do something
f = theano.function([meanx, target], mu, updates=updates)

In [41]:
# Now let's train
for i in range(100):
    out = f(5, 2)   # Start with input 5 and target 2
    if i%10 == 0: print(out)

5527.95796295493
5.587731639540578
2.0023293369952597
2.000001512323491
2.000000000981877
2.0000000000006377
2.0000000000000004
2.0
2.0
2.0


So it works, somewhat, but I feel like this is very clumsy and I don't actually want to write the code this way. But before looking at better code, I should maybe try to include the variance and CRPS cost function.

But wait, why was the initial guess so freaking far off... Something might be wrong here

In [53]:
# Copy the mean part from above and add the std

# Let's create variables for the input
meanx = theano.tensor.fscalar('meanx')   # This is for 32 bit floats
stdx = theano.tensor.fscalar('stdx')

# Now the weights and biases a and b
a = theano.shared(np.asarray(0.5), 'a')  # The explisit names help with debugging
b = theano.shared(np.asarray(2.), 'b')  # Make sure this is a float!!!
c = theano.shared(np.asarray(0.5), 'c')
d = theano.shared(np.asarray(2.), 'd') 

# Let's define mu and also sigma
mu = meanx * b + a
sigma = stdx * d + c

In [54]:
import theano.tensor as T
# Now the target which is still a scalar
target = theano.tensor.fscalar('target')

# And here comes the cost function
# First a little helper variable to keep the equation short

# Will this simple fix be enough to avoid the negative std problem
var = T.sqr(sigma)

loc = (target - mu) / T.sqrt(var)

# This is now copied from Kai P.'s code
phi = 1.0 / np.sqrt(2.0 * np.pi) * T.exp(-T.square(loc) / 2.0)
Phi = 0.5 * (1.0 + T.erf(loc / np.sqrt(2.0)))

crps =  T.sqrt(var) * (loc * (2. * Phi - 1.) + 2 * phi - 1. / np.sqrt(np.pi))

# Now compute the gradients
gradients = theano.tensor.grad(crps, [a, b, c, d])


In [55]:
# Let's not define the updates
lr = 0.01   # Learning rate

a_updated = a - (lr * gradients[0])
b_updated = b - (lr * gradients[1])
c_updated = c - (lr * gradients[2])
d_updated = d - (lr * gradients[3])

updates = [(a, a_updated), (b, b_updated), (c, c_updated), (d, d_updated)]

# Ok, so what does the function do:
# The first argument, the list defines the input, 
# the second argument defines the output, can that be more than one?
f = theano.function([meanx, stdx, target], [mu, sigma, crps], 
                    updates=updates)

In [56]:
# Now let's do some training
for i in range(100):
    out = f(3, 5, 2)   # Start with input: [meanx = 3, stdx = 5, target = 2]
    if i%10 == 0: print(out)

[array(6.5), array(10.5), array(3.2116223906950925)]
[array(6.172896841328296), array(10.069281985719517), array(3.0333349789339605)]
[array(5.856064480234552), array(9.627477198409583), array(2.8579359052154905)]
[array(5.549384423566165), array(9.175080822190472), array(2.6852198566143)]
[array(5.252767827338031), array(8.712543922329491), array(2.51500267424618)]
[array(4.966156339250551), array(8.240277144289273), array(2.347118863001946)]
[array(4.689523638032825), array(7.758653774689849), array(2.1814194874355133)]
[array(4.422877827258022), array(7.268012205597393), array(2.0177703969119394)]
[array(4.166264920204845), array(6.768657816241148), array(1.8560507365893308)]
[array(3.919773772063541), array(6.260864253742679), array(1.6961517129811856)]


In [57]:
# Yes, let's do some more training
for i in range(1000):
    out = f(3, 5, 2)   # Start with input: [meanx = 3, stdx = 5, target = 2]
    if i%100 == 0: print(out)

[array(3.6835430045169386), array(5.744874051278793), array(1.5379755949408345)]
[array(2.0110420472174706), array(0.1576020372779714), array(0.037139314934789745)]
[array(1.9921016365325739), array(0.0549723980200141), array(0.013298726433425246)]
[array(2.0054474051467346), array(-0.013570682845248927), array(0.004032216969526256)]
[array(2.0180540055771665), array(-0.10893616183902224), array(0.026648781552628186)]
[array(2.0964848350399143), array(0.147454903282231), array(0.0587844456704483)]
[array(1.986253381720927), array(-0.0015220247658093433), array(0.012887907760301742)]
[array(1.9559260455422292), array(0.12170837275047497), array(0.03474123110034233)]
[array(2.0058490039752845), array(-0.046442732848182636), array(0.011146916184997418)]
[array(2.0493908057584203), array(-0.1084959531987393), array(0.03417312722588535)]


Ok wow what happened here, it got to almost perfect, but then skyrocketed off somewhere... How can the crps be negative? I get negative standard deviations.... This means I have to actually put the square in there somewhere. So let's actually use the variance instead! This looks good. Thanks Kai!

So for now we can only put in scalars, while in reality we would like to put in arrays. So let's modify our code to allow for several fcs and obs

In [58]:
# Copy again!
# But let's use a 1D vector this time
meanx = theano.tensor.fvector('meanx')   # This is for 32 bit floats
stdx = theano.tensor.fvector('stdx')

# Now the weights and biases a and b
a = theano.shared(np.asarray(0.5), 'a')  # The explisit names help with debugging
b = theano.shared(np.asarray(2.), 'b')  # Make sure this is a float!!!
c = theano.shared(np.asarray(0.5), 'c')
d = theano.shared(np.asarray(2.), 'd') 

# Let's define mu and also sigma
mu = meanx * b + a
sigma = stdx * d + c

In [59]:
# Now we also have to change the target input to a vectory
target = theano.tensor.fvector('target')

In [60]:
# The cost function still has to return a scalar, so we will take the mean 
var = T.sqr(sigma)
loc = (target - mu) / T.sqrt(var)
# This is now copied from Kai P.'s code
phi = 1.0 / np.sqrt(2.0 * np.pi) * T.exp(-T.square(loc) / 2.0)
Phi = 0.5 * (1.0 + T.erf(loc / np.sqrt(2.0)))

crps =  T.sqrt(var) * (loc * (2. * Phi - 1.) + 2 * phi - 1. / np.sqrt(np.pi))
CRPS = T.mean(crps)

# Now compute the gradients
gradients = theano.tensor.grad(CRPS, [a, b, c, d])

In [70]:
# Let's not define the updates
lr = 0.01   # Learning rate

a_updated = a - (lr * gradients[0])
b_updated = b - (lr * gradients[1])
c_updated = c - (lr * gradients[2])
d_updated = d - (lr * gradients[3])

updates = [(a, a_updated), (b, b_updated), (c, c_updated), (d, d_updated)]

# Ok, so what does the function do:
# The first argument, the list defines the input, 
# the second argument defines the output, can that be more than one?

# Let's now actually return the four weights and the mean CRPS
f = theano.function([meanx, stdx, target], [a, b, c, d, CRPS], 
                    updates=updates)

In [71]:
# Let's define some input arrays
nb_data = 100  # Number of data
in_meanx = np.asarray(np.random.randn(nb_data) + 3, dtype='float32')   # Random with mean 3 and std 1
in_stdx = np.asarray(2 * np.random.randn(nb_data) + 1, dtype='float32')
in_target = np.asarray(1.5 * np.random.randn(nb_data) + 2, dtype='float32')


In [72]:
out = f(in_meanx, in_stdx, in_target)

In [73]:
out

[array(0.49334241949446894),
 array(1.9784306672279734),
 array(0.5007316959309528),
 array(2.0004942449420238),
 array(3.50234543323941)]

In [77]:

# Yes, let's do some more training
for i in range(10000):
    out = f(in_meanx, in_stdx, in_target)   # Start with input: [meanx = 3, stdx = 5, target = 2]
    if i%1000 == 0: print(out)

[array(1.730620087048809), array(0.06644880405138838), array(1.4827481952684731), array(0.009317388676805285), array(0.8512899069094731)]
[array(1.7344547140655597), array(0.06529631112669755), array(1.4826027379610665), array(0.009346970911675379), array(0.8512882834960577)]
[array(1.7371037852117128), array(0.06450021021093287), array(1.4824934799112637), array(0.009369635469623851), array(0.8512875085458209)]
[array(1.7389338709270818), array(0.06395024587064241), array(1.482416063087569), array(0.009385857957330527), array(0.851287138661258)]
[array(1.7401981448712545), array(0.06357031670432199), array(1.482362238236333), array(0.009397205673835831), array(0.8512869621321053)]
[array(1.7410715195142321), array(0.06330785665305479), array(1.4823250418608427), array(0.009405078694548085), array(0.8512868778885001)]
[array(1.741674844128684), array(0.06312654932356954), array(1.4822993790674543), array(0.009410524922049938), array(0.8512868376874936)]
[array(1.742091612442789), array

Ok, so 0.85 seems to be the lowest CRPS to get with this input data!

Ok, nice, we have extended our approach to vector input data. Now it is probably time to clean up and check out a litlle bit how to write good theano code. Then we can feed real data to the algorithm!