Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix log_loss and add test for multiclass #28

Closed
wants to merge 1,007 commits into from
Closed

Fix log_loss and add test for multiclass #28

wants to merge 1,007 commits into from

Conversation

glennq
Copy link

@glennq glennq commented Aug 28, 2015

In MLPClassifier, loss is set to 'log_loss' for both multiclass case and multilabel case.

The log_loss implementation in neural_network/base.py is in fact binary cross entropy, but for multiclass case where self.out_activation_ is set to 'softmax', it does not make sense to use binary cross entropy loss. I think categorical cross entropy, or otherwise called negative log likelihood should be used instead.

return -np.sum(y_true * np.log(y_prob)) / y_prob.shape[0]


def binary_log_loss(y_true, y_prob):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you think calling it multilabel_log_loss would be more informative?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean renaming binary_log_loss to multilabel_log_loss? This makes sense, it's just that this loss function is actually used in both binary and multilabel case since I made minimal change to BaseMultilayerPerceptron._initialize(). I took the 'binary' here from 'binary cross-entropy'.

@amueller
Copy link
Owner

I agree, that's a bug.
It would be good to have a non-regression test.

@amueller
Copy link
Owner

I'm not sure if it is better to work on this branch or on scikit-learn#3204. But we can fix this here for now.

@glennq
Copy link
Author

glennq commented Aug 28, 2015

I'm also confused about which pull request is going to be merged to master, but I guess we should first merge the two pull requests 3939 and 3204.

@amueller
Copy link
Owner

Unfortunately that is not entirely decided. scikit-learn#3204 is more "up to date" but I'm not sure if all the additions there are useful.
It would be good to benchmark / review the implemented learning algorithms before deciding which way to go.

@glennq
Copy link
Author

glennq commented Aug 28, 2015

I see. I was trying to compare the two pull requests through github but couldn't get an informative comparison result.
I guess the main difference is the newly added learning_rate_class which implements a few learning rate adaption schemes for SGD.
So I think you were talking about benchmark on these algorithm. This makes sense.

I'm just wondering if it would be better to merge an extensible minimal working implementation first, and consider new features later, since this pull request has been around for nearly two years...

@amueller
Copy link
Owner

I mostly agree with your assessment. The problem is that we need working default values, and if the defaults just not work, people might be discouraged from using it.
So we need at least one well-working example. And we probably need to include early stopping.

I just wrote an email to some people who have worked on this and on deep learning in general to ask their opinion.

Apart from benchmarking the learning schedules (these are not really learning rates imho), the other thing would be investigating early stopping.
I think it is not included in my branch yet, but there should be an option to split the provided dataset into a test and validation set, and do early stopping on the validation set.
It would be great if you could provide an example where this is beneficial.

yanlend and others added 23 commits October 19, 2015 13:46
…into patch-1

Conflicts:
	doc/whats_new.rst
    Empty "residues_" in "LinearRegression": documented the case when the "residues_" attribute is empty.
…d_stuff

[MRG] MAINT remove deprecated stuff that will no longer be supported in 0.18
[MRG+2] MAINT: deprecation warns from StandardScaler std_
[MRG + 1] Bug fix for unnormalized laplacian
Circle ci to build the documentation (only after merge to master)
…raph_is_connected

[MRG+1] Optimize sklearn.manifold._graph_is_connected
…huffle

[MRG+1] Remove shuffling in LabelKFold
Residues can be empty if the rank of X does not satisfy the
conditions described in scipy.linalg.lstsq documentation.
As residues are not that useful (i.e. we're not doing
stat testing), this property is deprecated and will be
removed in sklearn 0.19.
zermelozf and others added 29 commits October 21, 2015 10:24
[MRG+1] The code was raising an exception while plotting 2nd curve
[MRG + 1] Removed deprecated stuff in 0.18
…-1-row-csr-fix

[MRG + 1] max abs scaler 1 row csr fix
…zed-svd

[MRG + 2] ENH: optimizing power iterations phase for randomized_svd
…age as default multioutput parameter of r2_score function after 0.19)

adding deprecation message

fixing deprecation message
confirm deprecation starting from 0.16 and removal after 0.18
Used to print "('violation:', 1.0)". Now "violation: 1.0".
Empty `residues_` in `LinearRegression`: docstring updated
Changed imports in test to separate testing of API and of internals.

See scikit-learngh-5509.
[MRG + 1] Remove deprecated stuff from SVM
…metrics_second_pass

[MRG+2] Use uniform_average as default multioutput parameter of r2_score function
…metrics

[MRG+1] removing deprecated files in metrics
…sionadded

DOC versionadded randomized_svd
@amueller amueller closed this Oct 23, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet