Matrix sum in Neural network's cost function #4

clumdee · 2017-08-29T14:09:46Z

Hi Jordi,

First of all, thanks so much for the notebooks. They really help me to follow through the course.
I have one question in your notebook 4, nnCostFunction -- where J = ... np.sum((np.log(a3.T)*(y_matrix)+np.log(1-a3).T*(1-y_matrix))).

I think this does matrix multiplication --> giving 10*10 matrix (or n_label * n_label). This gives a matrix, let's name this cost-matrix, Jc. This Jc matrix contains not only how a set of predicted values for one label differs from it's corresponding target (diagonal elements), but also how it is differs from targets of other labels (off-diagonal elements). For example, the multiplication would multiply a column of predicted values np.log(a3.T) of one label (e.g. k) with all columns of targets.

Then the code sums all elements of this matrix. This seems to over-calculate J. Instead of summing all the elements, I think only the diagonal elements are needed.

Please use this picture to accommodate my description, which might be confusing.

Please let me know if I misunderstood the code.

Best regards and thanks again,
-Tua

The text was updated successfully, but these errors were encountered:

JWarmenhoven · 2017-08-29T20:18:22Z

Hi Tua,

The code you refer to above is the implementation of the Regularized Cost Function shown just above the code in the notebook and in section 1.4 of the Coursera exercise document. It will return a single, scalar value (not a matrix) assigned to variable J.

I am not sure I understand what you mean with 'over-calculating' cost J.

clumdee · 2017-08-29T20:46:36Z

Hi Jordi,

I understand that the code is the implementation of the Regularized Cost Function shown above it.

What I meant is, I think the np.sum in J = -1*(1/m)*np.sum((np.log(a3.T)*(y_matrix)+np.log(1-a3).T*(1-y_matrix))) should be replaced with summing only the diagonal elements of ((np.log(a3.T)*(y_matrix)+np.log(1-a3).T*(1-y_matrix)). Because np.sum would sum all the elements of the e.g. the output matrix in the image below. And it will not be the same as the Regularized Cost Function the code refers to.

For simplicity, I only wrote the output of (np.log(a3.T)*(y_matrix) but the same argument apply for np.log(1-a3).T*(1-y_matrix).

Please let me know your thoughts.

Best regards,
-Tua

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Matrix sum in Neural network's cost function #4

Matrix sum in Neural network's cost function #4

clumdee commented Aug 29, 2017

JWarmenhoven commented Aug 29, 2017

clumdee commented Aug 29, 2017

Matrix sum in Neural network's cost function #4

Matrix sum in Neural network's cost function #4

Comments

clumdee commented Aug 29, 2017

JWarmenhoven commented Aug 29, 2017

clumdee commented Aug 29, 2017