-
Notifications
You must be signed in to change notification settings - Fork 351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add cast floatX to avoid theano warning that cuda array can not cast fro... #606
Conversation
How do I reproduce such a warning? I ran MNIST example on my GPU and everything was clean. |
Add Xavier weight initialization and BinaryMisclassificationRate which is similar to MisclassificationRate but works with a binary (scalar) target |
@udibr , as I said in my previous message, I can not reproduce the warning you report by running a MNIST demo. Can you shed some light on when it actually happens? |
class Xavier(NdarrayInitialization): | ||
"""Initialize with Gaussian distribution with Xavier parameters. | ||
|
||
Use the following gaussian parameters: mean=0 and var=scale/Nin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently you forgot a square root here. Also we would need a reference to the paper here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've add reference (and changed var to std so as to not confuse the reader)
however, the original paper uses a uniform distribution and I am using a gaussian....
so either I change my code or not call it Xavier or keep it as is (most places use gaussian)
Because this sort of urgent, I opened a new PR for this issue to be merged right away, #635. However, you can continue working on this one and we will be happy to merge it later, when it is ready. |
@@ -58,7 +58,8 @@ def cost_matrix(self, y, y_hat): | |||
class CategoricalCrossEntropy(Cost): | |||
@application(outputs=["cost"]) | |||
def apply(self, y, y_hat): | |||
cost = tensor.nnet.categorical_crossentropy(y_hat, y).mean() | |||
cost = tensor.nnet.categorical_crossentropy(y_hat, y).mean( | |||
acc_dtype=floatX) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you think we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It resolved, for me, the same error messages I got in MisclassificationRate regarding float64 and CUDA. I guess this happens when y is int64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will be fixed with Theano/Theano#2913.
I also checked that given the first float32
and the second int64
argument it returns float32
, which is just what we need:
In [5]: tensor.nnet.categorical_crossentropy(numpy.array([[0.5, 0.5], [0.5, 0.5]], dtype='f32'), numpy.array([0, 1], dtype='i64')).eval()
Out[5]: array([ 0.69314718, 0.69314718], dtype=float32)
Xavier initialization and the fix in reverse_words look good to me, but please rebase the PR against the latest master (see our http://blocks.readthedocs.org/en/latest/development/pull_request.html for instructions) |
@@ -223,3 +223,33 @@ def generate(self, rng, shape): | |||
replace=False) | |||
weights[i, random_indices] = values[i] | |||
return weights | |||
|
|||
|
|||
class Xavier(NdarrayInitialization): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Xavier" is the inventor's first name, so it's kind of weird to call it that. "Glorot" or "GlorotBengio" might be a better name...
def apply(self, y, y_hat): | ||
# Here we have to cast both operands to floatX explicitly. | ||
# Because int64 / float32 = float64 in Theano, unfortunately. | ||
return (tensor.sum(tensor.neq(y, y_hat > 0.5)).astype(floatX) / |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is almost a copy of MisclassificationRate
, which is not nice.
You can add it as the second application method to MisclassificationRate
, e.g. apply_binary
, with the common code they have moved to a private method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also thought this way but then I noticed that in the same file you have both
BinaryCrossEntropy and CategoricalCrossEntropy so I thought it would be more appropriate to have a separate misclassification rates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…from float64 or int64
It seems like your pull request contains your master branch.. Can you make a bit clearer what of this do you actually want to contribute to Blocks? And it would be better to have separate PRs for the initialization scheme, Adagrad, etc. |
You are right. I should have a separate branch for each feature and not everything on master. I am closing this pull request and will try to fix this |
add cast floatX to avoid theano warning that cuda array can not cast from float64 or int64