# Python implementation of softmax loss layer #4023

Closed
opened this Issue Apr 20, 2016 · 1 comment

Projects
None yet
2 participants

### biprajiman commented Apr 20, 2016 • edited Edited 1 time biprajiman edited Apr 20, 2016 (most recent)

Hi,

I am trying to implement the softmaxloss layer in python to be used with pycaffe. I followed the example of euclidean loss and created a simple code as a starting point:

# ----------------------------------------------------------------------------------------------------------------------------

class SoftmaxLossLayer(caffe.Layer):

``````def setup(self, bottom, top):
# check input pair
if len(bottom) != 2:
raise Exception("Need two inputs to compute distance.")

def reshape(self, bottom, top):
# check input dimensions match
if bottom[0].num != bottom[1].num:
raise Exception("Inputs must have the same dimension.")
#raise Exception("Inputs must have the same dimension.")
# difference is shape of inputs
self.diff = np.zeros_like(bottom[0].data, dtype=np.float32)
# loss output is scalar
top[0].reshape(1)

def forward(self, bottom, top):
scores = bottom[0].data
exp_scores = np.exp(scores)
probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)
correct_logprobs = -np.log(probs[range(bottom[0].num),np.array(bottom[1].data,dtype=np.uint16)])
data_loss = np.sum(correct_logprobs)/bottom[0].num

self.diff[...] = probs
top[0].data[...] = data_loss

def backward(self, top, propagate_down, bottom):
delta = self.diff

for i in range(2):
if not propagate_down[i]:
continue
if i == 0:
delta[range(bottom[0].num), np.array(bottom[1].data,dtype=np.uint16)] -= 1

bottom[i].diff[...] = delta/bottom[0].num
``````

# ------------------------------------------------------------------------------------------------------------------------------

The code is working for simple LeNet and loss seems to be decreasing. I would be willing to modify this code and make it upto the standard and share. I need guidance on what am I missing (I read the c++ code and this one is far from what the c++ is doing) and modify the code to match the c++ code so that it is more generic.

You may ask why to go through this trouble, well modifying python code to create new loss is easier for me than to go through the c++ code which might take long time.

Thank you in advance for any help.

Contributor

### seanbell commented Apr 20, 2016 • edited Edited 1 time seanbell edited Apr 20, 2016 (most recent)

 While a python layer is nice for academic/learning purposes, there's no need for it in caffe since the C++ one is faster and uses the GPU. Also note that your forward expression is numerically unstable; you should look into lectures explaining Softmax (e.g. http://cs231n.github.io/linear-classify/#softmax) to see how to fix it. I'm closing this since it's a modeling/usage question. Please continue the discussion on the mailing list. From https://github.com/BVLC/caffe/blob/master/CONTRIBUTING.md: Please do not post usage, installation, or modeling questions, or other requests for help to Issues. Use the caffe-users list instead. This helps developers maintain a clear, uncluttered, and efficient view of the state of Caffe.