Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matrix sum in Neural network's cost function #4

Open
clumdee opened this issue Aug 29, 2017 · 2 comments
Open

Matrix sum in Neural network's cost function #4

clumdee opened this issue Aug 29, 2017 · 2 comments

Comments

@clumdee
Copy link

clumdee commented Aug 29, 2017

Hi Jordi,

First of all, thanks so much for the notebooks. They really help me to follow through the course.
I have one question in your notebook 4, nnCostFunction -- where J = ... np.sum((np.log(a3.T)*(y_matrix)+np.log(1-a3).T*(1-y_matrix))).

I think this does matrix multiplication --> giving 10*10 matrix (or n_label * n_label). This gives a matrix, let's name this cost-matrix, Jc. This Jc matrix contains not only how a set of predicted values for one label differs from it's corresponding target (diagonal elements), but also how it is differs from targets of other labels (off-diagonal elements). For example, the multiplication would multiply a column of predicted values np.log(a3.T) of one label (e.g. k) with all columns of targets.

Then the code sums all elements of this matrix. This seems to over-calculate J. Instead of summing all the elements, I think only the diagonal elements are needed.

Please use this picture to accommodate my description, which might be confusing.
img_20170829_155209

Please let me know if I misunderstood the code.

Best regards and thanks again,
-Tua

@JWarmenhoven
Copy link
Owner

Hi Tua,

The code you refer to above is the implementation of the Regularized Cost Function shown just above the code in the notebook and in section 1.4 of the Coursera exercise document. It will return a single, scalar value (not a matrix) assigned to variable J.

I am not sure I understand what you mean with 'over-calculating' cost J.

@clumdee
Copy link
Author

clumdee commented Aug 29, 2017

Hi Jordi,

I understand that the code is the implementation of the Regularized Cost Function shown above it.

What I meant is, I think the np.sum in J = -1*(1/m)*np.sum((np.log(a3.T)*(y_matrix)+np.log(1-a3).T*(1-y_matrix))) should be replaced with summing only the diagonal elements of ((np.log(a3.T)*(y_matrix)+np.log(1-a3).T*(1-y_matrix)). Because np.sum would sum all the elements of the e.g. the output matrix in the image below. And it will not be the same as the Regularized Cost Function the code refers to.

For simplicity, I only wrote the output of (np.log(a3.T)*(y_matrix) but the same argument apply for np.log(1-a3).T*(1-y_matrix).
img_20170829_223739

Please let me know your thoughts.

Best regards,
-Tua

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants