# Cross-Entropy Criterion Layer
In this notebook, we will look into the forward and the backward the the ```nn.CrossEntropyCriterion``` layer. We will also see how to compute the gradient of the loss respect to the output of the network $\frac{\partial L}{\partial O}$.

#### Input
We explicitly initialize the output values to be as bellow:

In [1]:
o = torch.Tensor({3.2, 5.1, -1.7}) 
print(o)

 3.2000
 5.1000
-1.7000
[torch.DoubleTensor of size 3]



#### Target
Assume that the target should have been the $1^{st}$ class

In [2]:
t = torch.Tensor({1})
print(t)

 1
[torch.DoubleTensor of size 1]



#### Calcuate the Loss
We verify that the loss is equaly to the value that we manually calcuated in class.

In [3]:
require 'nn';
cec = nn.CrossEntropyCriterion()
err = cec:forward(o, t)
print(err)

2.0403551528002	


#### Gradient of Loss with respect to Output
Now, let us look into the gradient of the loss respect to the output of the network $\frac{\partial L}{\partial O}$. We know that loss is equal to:
<img src="https://raw.githubusercontent.com/stencilman/CS763_Spring2017/master/Lec3%2C4/cec.png" alt="Cross-Entropy Criterion" style="width: 200px;"/>
As we saw in the class, the error is equal to $\hat{o}-[1, 0, 0]^{T}$, where $\hat{o}=SoftMax(o)$. So, let us know use torch to calcuate gradient of the loss respect to the output and then also do the same manually.

In [4]:
ohat = nn.SoftMax():forward(o)
print(ohat)

 0.1300
 0.8690
 0.0010
[torch.DoubleTensor of size 3]



In [5]:
dl_do = cec:backward(o, t)
print(dl_do)

-0.8700
 0.8690
 0.0010
[torch.DoubleTensor of size 3]



In [6]:
target = torch.Tensor({1, 0, 0})
dl_do_manual = ohat - target
print(dl_do_manual)

-0.8700
 0.8690
 0.0010
[torch.DoubleTensor of size 3]



Note how ```dl_do``` and ```dl_do_manual``` are exactly the same.