Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: Define X,Y grid so that they include -1 and 1 #15

Closed
simonhessner opened this issue Jun 22, 2019 · 6 comments
Closed

Suggestion: Define X,Y grid so that they include -1 and 1 #15

simonhessner opened this issue Jun 22, 2019 · 6 comments

Comments

@simonhessner
Copy link

simonhessner commented Jun 22, 2019

I have read the paper and was wondering if there is a fix for the problem stated on page 8:

Analysis of misclassified examples revealed that DSNT was less accurate for predicting edge case joints that lie very close to the image boundary, which is expected due to how the layer works

The reason seems to be that the X and Y grid is defined to lie in the range (-1,1) by the formulas on page 4. Is there a specific reason for this or would the DSNT also work when the grids are in the range [-1,1]?

A formula to define such a grid would be

-1 + (2*(i-1)) / (w-1)

For a heatmap that has the width 5, the grid would have these values in the columns:

i=1 => -1
i=2 => -1 + 2/4 = -0.5
i=3 => -1 + 4/4 = 0
i=4 => -1 + 6/4 = 0.5
i=5 => -1 + 8/4 = 1

So the grid would look like
-1 | -0.5 | 0 | 0.5 | 1

instead of
-0.8 | -0.4 | 0 | 0.4 | 0.8

So my question is if there is a reason to use the second grid instead of the first one? From what I see this should also work. If there is interest in this change, I could try to implement it.

The advantage would be that the system will be able to regress coordinates on the border and not just very close to the border (depending on the heatmap dimensions)

@anibali
Copy link
Owner

anibali commented Jun 23, 2019

Yes, I am aware of this distinction. In fact, my first implementation took the approach that you are proposing. In reality I don't believe there to be much of a practical difference between the two approaches and I happened to like the property of -1=far left, +1=far right when I created this particular library. There is currently an effort to move much of the functionality from this library into Kornia (kornia/kornia#167), which will once again use the scheme that you propose here. So no need to implement it yourself.

@anibali
Copy link
Owner

anibali commented Jun 23, 2019

The advantage would be that the system will be able to regress coordinates on the border and not just very close to the border (depending on the heatmap dimensions)

I don't think that this is true. It should be the same either way.

@simonhessner
Copy link
Author

simonhessner commented Jun 23, 2019

May I ask why you decided to use (-1,1) rather than [-1,1]? With the current implementation it is not possible to regress for example (x,y)=(1,1) (as the paper says), but what if you have a task where you need to be able to regress coordinates everywhere in the image?

Thanks for the hint with kornia! Did not know about that library :)

@anibali
Copy link
Owner

anibali commented Jun 23, 2019

It doesn't make a difference because you should also change how you convert from pixel coordinates to normalised coordinates. So yes, the normalised value corresponding to pixel (0, 0) isn't (-1, -1), but it still maps to (0, 0) so it doesn't matter. You can use the normalized_to_pixel_coordinates and pixel_to_normalized_coordinates functions to help with that.

Example:
You have a 5x5 image like you describe in your first post. The model predicts location (-0.8, 0.8). Converting to pixels you get (0, 4). This is the last pixel of the first column---right in the corner. If we used the other representation, the model would predict (-1, 1) for the same location, but the end result would be the same because the conversion formula would be slightly different.

I hope that this clears things up.

@simonhessner
Copy link
Author

simonhessner commented Oct 14, 2019

Hi,

as I see DSNT has now been merged into Kornia. Is the version in Kornia "final" and I should switch to it?

For now I have a question about the normalized_to_pixel_coordinates function. The docs say:

Coordinate tensor, where elements in the last dimension are ordered as (x, y, ..)

I am not sure if I understand this correctly. For example, my tensors are shaped like this:

(BATCH_SIZE, N_LANDMARKS, 2) and the last dimension (2) contains x and y, each in its own "column". So there is no column that contains x1,y1,x2,y2,...,xn,yn. It seems to work, but I want to be sure it is in the correct format to avoid any bad results.

When trying this simple example:

dsntnn.normalized_to_pixel_coordinates(torch.tensor([-1.0, 1.0, 0.0]), (128))

I get:

tensor([ -0.5000, 127.5000, 63.5000])

Shouldn't it be [0.0, 128, 64] instead?

EDIT: Okay, my fault. I forgot that the coordinate range is (-1,1) and not [-1,1]... So my normalized coordinates are outside the valid range, thus the result is also outside the range. This works:

dsntnn.normalized_to_pixel_coordinates(dsntnn.pixel_to_normalized_coordinates(torch.tensor([128.0]), (128)), (128))

tensor([128.])

@anibali
Copy link
Owner

anibali commented Oct 15, 2019

The statement in the docs is simply referring to the order of the coordinates in the last dimension. Let's look at your example:

(BATCH_SIZE, N_LANDMARKS, 2) and the last dimension (2) contains x and y, each in its own "column".

Here there are two possibilities for ordering the last dimension (ie the order of the two "columns"): (x, y) or (y, x). The docs simply say that the first ordering is used. I felt that this was worth pointing out because it is the reverse ordering of the image dimensions, which are assumed to be height x width (not width x height).

And yes, the implementation in Kornia is more polished than this one---just be aware that the normalisation of coordinates is different, as noted earlier.

@anibali anibali closed this as completed Oct 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants