-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime error #1
Comments
Yes I've successfully trained several models. For some reason I cannot reproduce this error on my machine. Did you make sure your repo is up to date with the current master branch? |
I am using Python 3, and have not tested it using 2.7, so that is the only thing I can think of at the moment if you're local repo is up to date. I will add the lack of 2.7 support to the README if that's the issue. |
I didn't build from the source but installed pytorch from pip. I have also made some changes to adapt your code to python 2.7 (link star expression) I checked the latest master branch and found that https://github.com/pytorch/pytorch/blob/master/torch/autograd/variable.py#L317-L320 still only supports scalar division. In the case of your code "x/=norm.expand_as(x)", it is clearly an element-wise division. But I don't understand how the python version can affect this. |
BTW, could you please give me a rough time estimation for running one epoch ( with machine specs)? |
Yeah I agree I don't understand how it is working on my computer if that's the case. I'll look into it more after my classes today, sorry I don't have an answer right this second. As for the time estimate, it takes ~1.4 seconds to run a batch of size 32 forward and backward, but I'm not in my lab right now so I can't remember the exact time per epoch. Will get back to you on all of this right after class. |
And that's on a single Tesla K80 ^ |
I think if you update to the latest version of Pytorch you will see that element-wise division with .div_() is supported. I do remember that it was originally not supported, but they added it not too long ago. When I run something as simple as: x = torch.Tensor([1,2,3,4,5,6])
y = torch.Tensor([2,2,2,2,2,2])
x/=y the correct result is returned. With a batch size of 32, on 1 Tesla K80, it takes me ~ 109 sec. per epoch. |
As I mentioned in the previous post, in the latest github pytorch source code (master branch), it still shows: def div_(self, other):
if not isinstance(other, Variable) and not torch.is_tensor(other):
return DivConstant(other, inplace=True)(self)
raise RuntimeError("div_ only supports scalar multiplication") I still don't understand how it works for your case. But I will try to update my pytorch to the latest version. |
Yeah, I apologize for lack of a better answer, but since I cannot reproduce I am closing the issue for now. Let me know if updating PyTorch fixes the issue, I will try to see if I can figure out more info myself in the mean time.. |
Ah, figured it out. That line in the source code is referring to Variables, so it is just saying Variables cannot be divided by Tensors, but Variables can be divided by other Variables of the same size (which is the case here) and Tensors can be divided by other Tensors of the same size. torch/csrc/generic/methods/TensorMath.cwrap line 1038 has what looks like the place that bridges the python and C for the tensor So again, not sure what the exact source of the problem is in your case, but my best bet is your version of PyTorch. Hopefully that helps. |
Also, update on training time: it takes approx. 37.5 sec. per epoch with a gtx1060 and batch size of 16, which is what I am currently using (ran out of money to afford the K80 EC2 instance :P). |
Thanks a lot. I will definitely update my Pytorch. Regarding the training time, it only takes 37.5 sec for one epoch? (I suppose you were training using VOC2007 with about 10000 images, right?). I have tried training a mxnet SSD implementation which takes about 270 sec for one epoch using both VOC2007 and VOC2012 data on my titan x gpu card. Does this mean this pytorch ssd is even faster than the mxnet implementation, which doesn't seem to be true. |
Yeah, that's my bad. Disregard that number, its late here. Training on purely the training set (2501 images) from VOC07 it takes on average ~140 sec. per epoch on a single GTX 1060... So yeah the previous number was off by alot. I would be curious to see how it compares on a Titan X though. |
one more question ;-) I am wondering how you got the fc-reduced VGG-16 weights? |
Hahah of course... I converted them to Chainer and then from Chainer to PyTorch. I also was able to convert them to Torch and then from Torch to PyTorch, but the specific weight file I supply was one that took the Chainer route. |
hi, i just updated the pytorch to the latest version (0.1.11_5) and had the train.py run. |
Is this on the first feed forward or were you able to get through some iterations? The only time that line has every been an issue was a while back when I had an explicit 'background' label in the voc labelmap and it just became an index out of range issue for softmax. But I'm currently training as I type this and can't think of what could be causing that. Have you pulled the most recent update of master? Or maybe you're on a different branch? |
I faced this issue as well, with PyTorch version ( 0.1.12_4 ) which is very recent. I fixed it by changing the
I then am facing an issue in the
|
Edit : I believe there are basic Python2.7 vs Python3 compatibility issues which cause the problem, since this code was written for Python3 and not Python2.7 Adding the line |
great suggestion @superhans . adding |
hi,
have you successfully run the train.py?
I encountered a runtime error saying: "div_ only supports scalar multiplication" from line "x/=norm.expand_as(x)" in modules/l2norm.py
Then I changed this line to "x = x.div(nor.expand_as(x))" but got another cuda runtime error "device-side assert triggered" from line "return torch.cat([g_cxcy, g_wh], 1)" in box_utils.py
BTW, i am using python 2.7 instead of python3.
The text was updated successfully, but these errors were encountered: