Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-Square image problems #693

Closed
TimZaman opened this issue Apr 19, 2016 · 9 comments
Closed

Non-Square image problems #693

TimZaman opened this issue Apr 19, 2016 · 9 comments

Comments

@TimZaman
Copy link
Contributor

Working with non-square images. Did any of you try for this?
I cannot train because on the initial main.lua Validation() run it crashes with torch error code -11. Probably a fat memory or access issue?

Let's say my dataset is 30x1. Creation of the lmdb is fine.
Then torch output log tells me

Image channels are 1, Image width is 1 and Image height is 30

So that's not right, it said 30x1, and by universal and digits convention it should indeed be width*height.
Then in the data.lua source code my eye falls on https://github.com/NVIDIA/DIGITS/blob/master/tools/torch/data.lua#L281
logmessage.display(0,'Image channels are ' .. self.ImageChannels .. ', Image width is ' .. self.ImageSizeY .. ' and Image height is ' .. self.ImageSizeX)

Huh? width is Y and height is X? What's going on here? We need to revision that.

This got so weird that I decided to recreate the db. I made the not-working one with PNG+squash. Then i decided to do this with None Image Encoding+crop (although i did not crop anything). That last fact made it work! Yay!
Now then i decided to go back to the default 'squash'+None Image Encoding. And that worked. So something's up with nonsquare PNG's creation or loading.
To make things even weirder, if i resized my 30x1 input to 1x30, the model could run! This means it could load PNG's this time. I think the squashing-recomputation fixed something there.

Also, in all cases the dataset browsing works fine. Can anyone help me debug this (on the data.lua end)?

Sidenote: why are we allocating 2x the amount of data per image: https://github.com/NVIDIA/DIGITS/blob/master/tools/torch/data.lua#L351 something to do with jpegs?

@gheinrich
Copy link
Contributor

Hi, thanks for the report.

Working with non-square images. Did any of you try for this?

This is tested as part of the automatic Travis tests on every commit. There is also an example there. Of course we can't exclude the possibility of a bug lurking somewhere.

by universal and digits convention it should indeed be width*height

I am in slight disagreement with this statement. Read this or this or this.

Maybe your first model was crashing because you made a wrong assumption on the image format?

Sidenote: why are we allocating 2x the amount of data per image: https://github.com/NVIDIA/DIGITS/blob/master/tools/torch/data.lua#L351 something to do with jpegs?

Is the comment on that line not helpful?

@TimZaman
Copy link
Contributor Author

I mean that when talking about with and height one usually says 'width x height'. When loading a dataset in digits, it has two editboxes which do not indicate which is which. That is why i assume the boxes are:
[width] [height]
I did not mean the way caffe or torch order their dimentions. In any case, notice again the line where it says
Image width is ' .. self.ImageSizeY
It seems to say 'image width is height'.

@gheinrich
Copy link
Contributor

You're right the log message is wrong - hopefully the rest is OK!

@lukeyeager
Copy link
Member

We used to have a bunch of bugs related to this in DIGITS. I wrote this little guide to help keep me sane (not that it's particularly clear):
https://github.com/NVIDIA/DIGITS/blob/v3.2.0/digits/utils/image.py#L20-L33

I think we're following it pretty well, at least on the DIGITS/Caffe side of things. I don't use Torch as much, but as Greg said we do have tests for that.

gheinrich added a commit to gheinrich/DIGITS that referenced this issue Apr 20, 2016
The dataset creation form does not explicitly state which of the image dimension fields is the width and which is the height.
Though this can be inferred through `See example` it would probably help to make this explicit.
See NVIDIA#693
gheinrich added a commit to gheinrich/DIGITS that referenced this issue Apr 20, 2016
The dataset creation form does not explicitly state which of the image dimension fields is the width and which is the height.
Though this can be inferred through `See example` it would probably help to make this explicit.
See NVIDIA#693
gheinrich added a commit to gheinrich/DIGITS that referenced this issue Apr 20, 2016
The dataset creation form does not explicitly state which of the image dimension fields is the width and which is the height.
Though this can be inferred through `See example` it would probably help to make this explicit.
See NVIDIA#693
@TimZaman
Copy link
Contributor Author

Darn. I have just so many issues with wide images with torch. Caffe has no problem with them, but torch is crashing after dataLoader:waitNext() with torch error -11.
Or (after succesfully loading a few images:
NVIDIA/digits/tools/torch/main.lua:671: attempt to perform arithmetic on local 'data_batch_size' (a nil value)
I have the same thing with generic models that are wide as i had with classification models that were wide. The dataset was properly formatted, caffe works with them just fine.

gheinrich added a commit to gheinrich/DIGITS that referenced this issue Apr 21, 2016
@gheinrich
Copy link
Contributor

Hi @TimZaman I managed to see the same error message when I set the crop length to something larger than at least one of the image dimensions. You would get this message if an exception occurs in one of the data loader threads. If you don't have torch/threads@a9d4eca can you check with #699 and see if you get a more helpful message?

@gheinrich
Copy link
Contributor

@TimZaman are you still having issues with non-square images? Did #699 help root cause the issue?

@gheinrich
Copy link
Contributor

@TimZaman can you share an update? Thanks!

@gheinrich
Copy link
Contributor

Closing due to inactivity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants