Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training code #6

Open
datvtn opened this issue Mar 10, 2021 · 16 comments
Open

Training code #6

datvtn opened this issue Mar 10, 2021 · 16 comments

Comments

@datvtn
Copy link

datvtn commented Mar 10, 2021

Hi author,
Can you share H3R training code? I look forward to testing your model on 68 landmarks.
Thank you very much!

@ElteHupkes
Copy link

Haha the only other issue here seems to be asking for the exact same thing. Ok maybe not the training code but definitely the encoder. From browsing the code it appears to be the Coordinate2BinaryHeatmap module that I'm personally most interested in (it's mentioned in a config file somewhere but it's definitely not in the code otherwise).

Maybe we can reconstruct it ourselves though? Because I'm impatient ;). I must admit I find the paper a little bit vague with regards to the actual encoding procedure. I'm trying to understand how this,

image

translates to actual encoding; is a random t drawn from U(0, 1) and is the ground truth point assigned to just a single point depending on that? That doesn't seem lossless as the rest of that page implies. Or do we take the 4 points resulting from ceiling and flooring the coordinates and assign their heatmap intensity according to their fractional error from the actual point? That seems lossless to me with the given encoder, but I don't directly get that from (15).

@ElteHupkes
Copy link

ElteHupkes commented Mar 15, 2021

I still don't know how the paper's encoder works exactly, but based on the heatmap2coord method in the code I created a coord2heatmap method that's essentially the inverse (minus heatmap scaling). Since heatmap2coord runs the topk through softmax and the softmax of those values is not the same as the values themselves, the function is not immediately reversible for the correct heatmap output values. I added an "inverse softmax" option to account for this case such that you can actually call heatmap2coord on the result of coord2heatmap to get the inverse. I believe PyTorch can backpropagate through topk though so I'm guessing in the actual loss function you could just do topk -> softmax -> mse. Anyhow, enough of my blabbering, here's the code:

def coord2heatmap(x, y, w, h, ow, oh, softmax=True):
    """
    Turns an (x,y) coordinate into a lossless heatmap. Arguments:
    x: x coordinate
    y: y coordinate
    w: original width
    h: original height
    ow: output heatmap width
    oh: output heatmap height
    """
    # Get scale
    sx = ow / w
    sy = oh / h
    
    # Unrounded target points
    px = x * sx
    py = y * sy
    
    # Truncated coordinates
    nx,ny = int(px), int(py)
    
    # Coordinate error
    ex,ey = px - nx, py - ny
    
    # Multiplication factors required so the
    # heatmap intensities multiplied by their coordinates
    # result in the output coordinate.
    rr = tensor([[1-ey, ey]]).T @ tensor([[1-ex, ex]])
    
    heatmap = torch.zeros(oh, ow)
    if softmax:
        # Take the log of our coefficients so
        # we get our output predictions after
        # softmax.
        rr = torch.log(rr)
        
        # We can add any constant to rr for an
        # identical softmax, so we're making sure
        # every value is > 0. torch.min(rr) is 
        # negative since everything is in (0, 1)
        rr -= torch.min(rr) - 0.5
    
    heatmap[ny:ny+2,nx:nx+2] = rr
    return heatmap

Testing it:

# Some random points
w,h=2500,1500
x,y=163,1342
ow,oh=256,384

# Expected output:
expected = tensor([x*ow/w, y*oh/h])

# Note coord2heatmap expects N,C,H,W. Also, the result
# won't be accurate here for topk other than 4 (because the rest
# is zeros that could be anywhere.)
assert torch.allclose(expected, heatmap2coord(coord2heatmap(x,y,w,h,ow,oh,softmax=True).view(1, 1, oh, ow), topk=4)

One last thing: just realized that points within 1 pixel of the edge of the output heatmap will currently crash the code. Avoid those ;).

@datvtn
Copy link
Author

datvtn commented Mar 17, 2021

Same, I still don't know how the paper's encoder works exactly. But it's great when you comment that makes me understand the problem better. Don't know if you have reproduced yet? If so, are the results good?

@ElteHupkes
Copy link

I keep going down rabbit holes but I'm hoping to run an experiment today or tomorrow. As I commented on another issue earlier today the output of the head block in this repository seems to imply that CrossEntropyLoss was used while training, which would point to a binary heatmap. If instead of the code I posted before you do something like this:

xx = ex >= torch.rand(1)
yy = ey >= torch.rand(1)
rr = torch.zeros(2, 2)
rr[yy.long(), xx.long()] = 1

(use your imagination to decide where exactly this ends up please 😉), so you randomly assign the point to one of the four neighbors with a probability based on how big the error is, with enough heat maps generated the average result will be the same. So I'm going to try that, combined with cross entropy loss, on a simple dataset, and see what happens 🤷‍♂️.

@vuthede
Copy link

vuthede commented Mar 17, 2021

Hi @ElteHupkes,
Thanks for your interesting contribution so far in this repo. and I am using part of your code in my training code 👯‍♂️
what if I use the mse loss between the softmax of predicted heatmap and the GT heatmap which created by your code above.
1> Does it make sense to do that in your point of view ?
2> Do you think is there the huge difference in term of quality result when we use different loss function (rme and cross entropy)?

@ElteHupkes
Copy link

ElteHupkes commented Mar 17, 2021

@vuthede Glad to help, hope you're building something great 🙃. Would you mind sharing your precise approach if you have something that works?

  1. I was thinking about trying that too actually, my concern would be that there would be a significant difference between the softmax of the topk (which is used for inference) and the softmax of the whole plane (which would be what you use in training); but if the model would simply learn to push the irrelevant activations to zero (after sofmax) then it wouldn't matter at all. I'm not going to pretend I'm even remotely an expert in this, so I guess the best thing to do is just try!
  2. My understanding is that the reason that Gaussian kernels are used as activations in heat maps a lot is that a binary heat map with MSE won't train well, probably because of the massive imbalance in positive / negative activations. I tried training a model with too tiny kernels a while back, and the best loss it was able to learn was simply making everything zero 😅. At least with cross entropy there's a relatively easy way to account for the class imbalance. Of course Guassians in this approach don't make sense (I suppose we could center different intensity Gaussians around the 4 candidates, but this seems far fetched to me) so that leaves either Cross Entropy or some other class balancing approach using MSE - I think they're doing some fancy loss function magic in this paper but I haven't gone through it yet.

I'll be trying the cross entropy approach soon, if something comes out of it I'll report back.

EDIT: Point (2) also applies to point (1), if there isn't that much activation the model may just be tempted to learn a large field of zeros (they can't actually be zeros with softmax, but depending on the size of the heatmap everything can be pretty close to zero regardless).

@vuthede
Copy link

vuthede commented Mar 18, 2021

Hello @ElteHupkes ,
Yeah I've just train with Point (1) on LaPa to see how far it can go. Here is the quick demo
When my face is frontal, it is likely be ok, but when it comes to pose, or when there is a occluded part in face , then everything screwed up.
When training, I augment pretty like the paper said, i use CoarseDropout and CoarsePepper in imgaug for the Random Erase Effect
Do you have any feeling about the reason lead to that result?

  • Lack of augmentation,
  • Or there is something fundamentally wrong with the encode part in Point(1).
    If you happy, then I can share my code and we can figure it out what is going on together.

EDIT: I just figure out that the box annnotation policies author mentioned also affect alot to the result too.
image

@ElteHupkes
Copy link

@vuthede I don't know... the fact that it trains at all is probably a good sign, but maybe the encoding mechanism has some inherent limitations. Just out of curiosity, what architecture are you using? Using any pretrained weights?

FWIW I just tried the Cross Entropy approach with HRNet last night on a much simpler dataset unrelated to facial landmarks, and it most definitely trains, got some pretty decent accuracy in just a few epochs without any pretrained weights. I've no experience with LaPa but maybe I can have a look at it later.

@vuthede
Copy link

vuthede commented Mar 18, 2021

@ElteHupkes I used the MobileNetV2 with input size 256x256 with Imagenet pretrained weights as in paper mentioned.

@PuNeal
Copy link

PuNeal commented Mar 18, 2021

@vuthede @ElteHupkes Hello, I also implemented a function named coord2heatmap to construct a binary heatmap, and the code as follws:

def coord2heatmap(coord, s, size):
    """
    coord: the ground-truth numerical coordinate of landmarks
    s: stride
    """
    # coord[:, 0] = torch.clamp(coord[:, 0], min=0, max=(size[2] - 1))
    # coord[:, 1] = torch.clamp(coord[:, 1], min=0, max=(size[1] - 1))
    coord = coord / s
    coord_floor = torch.floor(coord).long()

    epsilon = coord - coord_floor
    epsilon_x = epsilon[:, 0]
    epsilon_y = epsilon[:, 1]
    heatmap = torch.zeros(size=size)
    for i in range(epsilon.size(0)):
        x = coord_floor[i, 0] if epsilon_x[i] < random.uniform(0, 1) else min(coord_floor[i, 0] + 1, size[2] - 1)
        y = coord_floor[i, 1] if epsilon_y[i] < random.uniform(0, 1) else min(coord_floor[i, 1] + 1, size[1] - 1)
        if not 0 <= x <= 63 and 0 <= y <= 63:
            assert 0 <= x <= 63 and 0 <= y <= 63, "index ({}, {}) out of bounds".format(x, y)
        heatmap[i, y, x] = 1
    return heatmap.long()

And I also tried to train a model on WFLW with BinaryCrossEntropy, but got pretty low acuracy. Can you share your code with us and figure it out together? @vuthede 😊

@ElteHupkes
Copy link

@PuNeal I got sidetracked a bit, but my initial experiments with Binary Cross Entropy actually look rather promising on LaPa. I'm only experimenting with a fraction of the data and points though because I simply don't have the time / GPU power available to take it much further than that.

I've decided to write a blog post about this subject, which will cover heat map regression in general and this paper/repo in particular. I'm writing it up using the Jupyter notebook I've used for experimenting, so all my code will be in there. It'll hopefully be done today, I'll make sure to share when it's online.

@PuNeal
Copy link

PuNeal commented Mar 19, 2021

@ElteHupkes Good job! I'm looking forward to it.:satisfied:

@ElteHupkes
Copy link

Here it is: https://elte.me/2021-03-10-keypoint-regression-fastai. Bit rough around the edges, but you should be able to get the things I've tried from the part starting here: https://elte.me/2021-03-10-keypoint-regression-fastai#random-rounding-and-high-resolution-net. Let me know your thoughts!

@vuthede
Copy link

vuthede commented Mar 21, 2021

Thanks @ElteHupkes, It is great blog.
Hi @PuNeal, sorry for late reply, I will share the code here https://github.com/vuthede/heatmap-based-landmarker/ .
Currently I am using Adaptive Wing Loss for guassian heatmap. Note: The mode of the heatmap is randomized mostly by the code of @ElteHupkes. After 70 epoch, the NME is 1.92 (The author's is 1.69). Live test show good result too.

@ElteHupkes
Copy link

@vuthede cool! You're using a MobileNetV2 I see, are you already running it on a phone? If so, how is it performing?

I'm really curious about this Adaptive Wing Loss, going to have a read through that later. Would be interesting to see how heatmaps + adaptive wing loss compare to binary maps + cross entropy.

@vuthede
Copy link

vuthede commented Mar 22, 2021

@ElteHupkes Yeah I am going to bring it on phone to see how it will perform.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants