Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hyperparams for HRNet-48 #1

Closed
sborse3 opened this issue Oct 18, 2020 · 25 comments
Closed

Hyperparams for HRNet-48 #1

sborse3 opened this issue Oct 18, 2020 · 25 comments

Comments

@sborse3
Copy link

sborse3 commented Oct 18, 2020

Please could you let me know the hyperparams used to train the HRNet-48 model from your paper (both for the 45.7% mIoU and the ~49% mIoU scores)? I have tried really hard to train HRNet-48 on single task in my repository, but it doesn't go beyond 44.8% mIoU.

Thank you.

@SimonVandenhende
Copy link
Owner

SimonVandenhende commented Oct 18, 2020

Hi

I believe the model was trained with Adam (lr=1e-4, weight decay=1e-4). I trained for 100 epochs using batches of size 8.
I used the pre-trained ImageNet weights from the HRNet repository.
The following augmentations were used:
train_transforms = Compose([RandomHorizontallyFlip(), RandomRescale([1.0,1.2,1.5], (480,640))])

The augmentations differ a bit from the ones used in this repository. The random rescale was implemented as follows:

class RandomRescale(object):
    def __init__(self, ratios, original_size):
        self.ratios = ratios
        self.center_crop = CenterCrop(original_size)
        
    def __call__(self, img, mask, depth):
        ratio = random.choice(self.ratios)
        w, h = img.size
        tw, th = int(ratio*w), int(ratio*h)
        
        img = img.resize((tw, th), Image.BILINEAR)
        mask = mask.resize((tw, th), Image.NEAREST)

        img, mask, depth = self.center_crop(img, mask, depth)
        return (img, mask)

Note that in this piece of code the images used the PIL format instead of the open-cv format.

Let me know if this is helfpul.

@sborse3
Copy link
Author

sborse3 commented Oct 18, 2020

Thank you for the response! I will try this. For this transform, have you used greyscale images, because it seems like img.size returns two outputs

@SimonVandenhende
Copy link
Owner

I did not include any color transformations like random grayscale or jitter. In this case, the img variable is a PIL Image object. The size function only returns the spatial resolution of the image for this class, and not the number of channels.

@flamehaze1115
Copy link

I did not include any color transformations like random grayscale or jitter. In this case, the img variable is a PIL Image object. The size function only returns the spatial resolution of the image for this class, and not the number of channels.

Hello. Thanks very much for releasing the codes. Could you provide the config files for HRNet-48? I directly use the config file of HRNet-18 with just changing the backbone, but I cannot reproduce the ST and MT results like your paper. For ST task of segmentation, the mIoU is just 43%. For mutli-task training, I use batch size 4 due to memory limit, but multi-task learning performance on test set is -26.31 compared with ST.
Could you provide the config files for easily reproducing your results?

@SimonVandenhende
Copy link
Owner

SimonVandenhende commented Oct 22, 2020

Hi. I used the same hyperparameters as for the HRNet-18 models.
One thing you need to do is change the augmentations to make them consistent with the paper (see previous comments).
This should fix the issues I believe. I currently have no time to re-train the models myself. If the issue still persists after November 16th, I will consider retraining the bigger models and put them online as well.

@flamehaze1115
Copy link

Hi. I used the same hyperparameters as for the HRNet-18 models.
One thing you need to do is change the augmentations to make them consistent with the paper (see previous comments).
This should fix the issues I believe. I currently have no time to re-train the models myself. If the issue still persists after November 16th, I will consider retraining the bigger models and put them online as well.

Thank you very much. Would you release your trained models for evaluation?

@SimonVandenhende
Copy link
Owner

I will probably do this as people are asking for it. But as I said, this will only be after the 16th of November.

@kotetsu-n
Copy link

Hi, I'm currently trying to reproduce the best result of NYUD-v2. I have read this issue, and tried to set the same setting but I coudn't figure out that.

You wrote you used the following augmentation,

train_transforms = Compose([RandomHorizontallyFlip(), RandomRescale([1.0,1.2,1.5], (480,640))])

Could you make the setting a bit more clear? Your current code is set to use the following transforms:

# Training transformations
    
# Horizontal flips with probability of 0.5
transforms_tr = [tr.RandomHorizontalFlip()] #  <- Modify only here? Or, you used only the above transform?
    
# Rotations and scaling
transforms_tr.extend([tr.ScaleNRotate(rots=(-20, 20), scales=(.75, 1.25),
                                          flagvals={x: p.ALL_TASKS.FLAGVALS[x] for x in p.ALL_TASKS.FLAGVALS})])
# Fixed Resize to input resolution
transforms_tr.extend([tr.FixedResize(resolutions={x: tuple(p.TRAIN.SCALE) for x in p.ALL_TASKS.FLAGVALS},
                                         flagvals={x: p.ALL_TASKS.FLAGVALS[x] for x in p.ALL_TASKS.FLAGVALS})])
transforms_tr.extend([tr.AddIgnoreRegions(), tr.ToTensor(),
                          tr.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])
transforms_tr = transforms.Compose(transforms_tr)

If you could tell me the right setting a bit more detailed, I would be really grateful.

@SimonVandenhende
Copy link
Owner

SimonVandenhende commented Dec 6, 2020

Hi. The augmentations used in this repo were implemented using the opencv (cv2) library.
The code excerpt above used the PIL library. So there is currently no support for the exact same implementation in this repository.
I will make some updates to the code repository this month, and will make sure to include it.

For the time being, I think you can use the train transforms I mentioned, and just add the ToTensor and Normalize operations.

@kotetsu-n
Copy link

Hi, thank you for your reply. I know it needs some modifications, so I tried to implement the transforms using PIL. But the result was not the same as yours. I will test using the setting you replied again. Also, I'm looking forward to seeing your updates!

@SimonVandenhende
Copy link
Owner

I will rerun the code myself to make sure. But will probably be towards the end of the month.
I have some other things that I need to take care of first.

@SimonVandenhende
Copy link
Owner

I am working to the fix the issue this week.

@SimonVandenhende
Copy link
Owner

SimonVandenhende commented Dec 16, 2020

I have made the code base consistent with the implementation used for the survey. The changes include the following:

  • Augmentations were adapted to use horizontal flips and random rescaling ([1.0, 1.2, 1.5]).
  • The depth is evaluated in a pixel-wise fashion, rather than by averaging per image as is the case in ASTMT.
  • The random rescaling operation also modifies the depth values. When zooming in, we divide the depth values by the scale.
  • The tasks are evaluated on the original NYUDv2 resolution. The data is made available through google drive, and is downloaded automatically when running the code for the first time.

At this point you should be able to get between 43.5 -- 44.0 MIoU using ResNet-50.
If you still wish to use the data loading from ASTMT, you should replace the nyu.py file.

@TianhaoFu
Copy link

Hi, Thanks again for your open-source code.

I'm running your new code.And It seems your google drive can not be open .The data cannot be download.

If you fix it, I would be really grateful.

@SimonVandenhende
Copy link
Owner

SimonVandenhende commented Dec 17, 2020

I see. I forgot to push the latest version of the nyud.py file. Should be fixed in the latest commit.
My apologies for the inconvenience.

@SimonVandenhende
Copy link
Owner

Let me know if it works out now :)

@TianhaoFu
Copy link

It works.

Thanks!

@prismformore
Copy link

I have made the code base consistent with the implementation used for the survey. The changes include the following:

  • Augmentations were adapted to use horizontal flips and random rescaling ([1.0, 1.2, 1.5]).
  • The depth is evaluated in a pixel-wise fashion, rather than by averaging per image as is the case in ASTMT.
  • The random rescaling operation also modifies the depth values. When zooming in, we divide the depth values by the scale.
  • The tasks are evaluated on the original NYUDv2 resolution. The data is made available through google drive, and is downloaded automatically when running the code for the first time.

At this point you should be able to get between 43.5 -- 44.0 MIoU using ResNet-50.
If you still wish to use the data loading from ASTMT, you should replace the nyu.py file.

May I know which config file should we use to achieve this result with resnet-50? It looks like there is no Mti-net config on resnet-50? Thank you very much for your help.

@SimonVandenhende
Copy link
Owner

I did not include the code for MTI-Net using a ResNet-50 backbone. Currently it only supports an HRNet backbone for MTI-Net.
However, you could include a ResNet-50 with feature pyramid network to get a multi-scale feature representation, from which MTI-Net can be run. The current code should give you the ResNet-50 results I included in the paper though for the encoder-based models, as I ran them using the same conditions.

@prismformore
Copy link

@SimonVandenhende Thank you!

@TianhaoFu
Copy link

Hi. I used the same hyperparameters as for the HRNet-18 models.
One thing you need to do is change the augmentations to make them consistent with the paper (see previous comments).
This should fix the issues I believe. I currently have no time to re-train the models myself. If the issue still persists after November 16th, I will consider retraining the bigger models and put them online as well.

Hi, I used batchsize=8 to train mti-net at 2 tasks(same hyperparameters as for the HRNet-18 models.). But I found the x_3_fpm['depth'].size() was [2, 384, 15, 20], In which the batchsize is equal to 2.not 8.

Could you explain it ?Thanks a lot !

@SimonVandenhende
Copy link
Owner

Hi. This could have to do with the specification of the number of backbone channels in utils/common_config.py, when adding the HRNet-48 backbone. You should make sure that this equals the number of channels that come out of the multi-scale feature representation generated by HRNet-48, which is different from the number of channels in the multi-scale feature representation from HRNet-18 (see line 34 in utils/common_config.py).

@TianhaoFu
Copy link

TianhaoFu commented Dec 24, 2020

Hi. This could have to do with the specification of the number of backbone channels in utils/common_config.py, when adding the HRNet-48 backbone. You should make sure that this equals the number of channels that come out of the multi-scale feature representation generated by HRNet-48, which is different from the number of channels in the multi-sca

Hi. I used HRNet-48 channels == [48,96,192,384 ] to train my network. But I also came cross that problem.I think I set right channels

And the other problem is that:
I train HRNet-48 on four tasks .My batchsize==6,epoch==80 .And I used your new data. But my semseg mIOU is around 45-46, depth rmse is around 0.56-0.57.

In your paper your performance is mIOU==49,rmse==0.529. Could you please tell me where is the problem in my training procedure. I have tried really hard to train it.

Thank you so much! @SimonVandenhende

@SimonVandenhende
Copy link
Owner

I tried the experiment using HRNet-48. I got about 45.5 MIoU for the single-tasking model, while 47.0 MIoU for MTI-Net. The multi-task learning improvement was about 2.9 %. I think there are still some small differences with my old implementation used for the MTI-Net paper, which gave slightly better absolute numbers. Still, the conclusions from the paper are valid.

Also, I advise to use the current implementation as it is inline with the one used for the survey paper. This should give you a fair comparison between architectures, as I spend quite some time finetuning the hyperparameters for every method, while also making sure that other implementation details like augmentations, etc. where the same among different methods.
The current code base produces the results for the encoder-based approaches from the paper using ResNet-50/18, and for the decoder-based approaches using HRNet-18.

@TianhaoFu
Copy link

TianhaoFu commented Feb 20, 2021

Hi, I noticed that in your latest HRnet-48 experiment, your MIoU is 47.0.
Then I wonder to know in that latest experiment, What is the value of the MTI-Net depth task rmse?
In addition, In you latest HRnet-48 experiment, does you train with 2 tasks and all 4 auxiliary task?

Thanks! @SimonVandenhende

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants