Hyperparams for HRNet-48 #1

sborse3 · 2020-10-18T00:09:36Z

Please could you let me know the hyperparams used to train the HRNet-48 model from your paper (both for the 45.7% mIoU and the ~49% mIoU scores)? I have tried really hard to train HRNet-48 on single task in my repository, but it doesn't go beyond 44.8% mIoU.

Thank you.

SimonVandenhende · 2020-10-18T08:18:40Z

Hi

I believe the model was trained with Adam (lr=1e-4, weight decay=1e-4). I trained for 100 epochs using batches of size 8.
I used the pre-trained ImageNet weights from the HRNet repository.
The following augmentations were used:
train_transforms = Compose([RandomHorizontallyFlip(), RandomRescale([1.0,1.2,1.5], (480,640))])

The augmentations differ a bit from the ones used in this repository. The random rescale was implemented as follows:

class RandomRescale(object):
    def __init__(self, ratios, original_size):
        self.ratios = ratios
        self.center_crop = CenterCrop(original_size)
        
    def __call__(self, img, mask, depth):
        ratio = random.choice(self.ratios)
        w, h = img.size
        tw, th = int(ratio*w), int(ratio*h)
        
        img = img.resize((tw, th), Image.BILINEAR)
        mask = mask.resize((tw, th), Image.NEAREST)

        img, mask, depth = self.center_crop(img, mask, depth)
        return (img, mask)

Note that in this piece of code the images used the PIL format instead of the open-cv format.

Let me know if this is helfpul.

sborse3 · 2020-10-18T09:47:37Z

Thank you for the response! I will try this. For this transform, have you used greyscale images, because it seems like img.size returns two outputs

SimonVandenhende · 2020-10-19T11:22:47Z

I did not include any color transformations like random grayscale or jitter. In this case, the img variable is a PIL Image object. The size function only returns the spatial resolution of the image for this class, and not the number of channels.

flamehaze1115 · 2020-10-22T01:53:07Z

I did not include any color transformations like random grayscale or jitter. In this case, the img variable is a PIL Image object. The size function only returns the spatial resolution of the image for this class, and not the number of channels.

Hello. Thanks very much for releasing the codes. Could you provide the config files for HRNet-48? I directly use the config file of HRNet-18 with just changing the backbone, but I cannot reproduce the ST and MT results like your paper. For ST task of segmentation, the mIoU is just 43%. For mutli-task training, I use batch size 4 due to memory limit, but multi-task learning performance on test set is -26.31 compared with ST.
Could you provide the config files for easily reproducing your results?

SimonVandenhende · 2020-10-22T07:07:08Z

Hi. I used the same hyperparameters as for the HRNet-18 models.
One thing you need to do is change the augmentations to make them consistent with the paper (see previous comments).
This should fix the issues I believe. I currently have no time to re-train the models myself. If the issue still persists after November 16th, I will consider retraining the bigger models and put them online as well.

flamehaze1115 · 2020-10-22T07:20:19Z

Hi. I used the same hyperparameters as for the HRNet-18 models.
One thing you need to do is change the augmentations to make them consistent with the paper (see previous comments).
This should fix the issues I believe. I currently have no time to re-train the models myself. If the issue still persists after November 16th, I will consider retraining the bigger models and put them online as well.

Thank you very much. Would you release your trained models for evaluation?

SimonVandenhende · 2020-10-22T07:30:35Z

I will probably do this as people are asking for it. But as I said, this will only be after the 16th of November.

kotetsu-n · 2020-12-05T00:28:17Z

Hi, I'm currently trying to reproduce the best result of NYUD-v2. I have read this issue, and tried to set the same setting but I coudn't figure out that.

You wrote you used the following augmentation,

train_transforms = Compose([RandomHorizontallyFlip(), RandomRescale([1.0,1.2,1.5], (480,640))])

Could you make the setting a bit more clear? Your current code is set to use the following transforms:

# Training transformations
    
# Horizontal flips with probability of 0.5
transforms_tr = [tr.RandomHorizontalFlip()] #  <- Modify only here? Or, you used only the above transform?
    
# Rotations and scaling
transforms_tr.extend([tr.ScaleNRotate(rots=(-20, 20), scales=(.75, 1.25),
                                          flagvals={x: p.ALL_TASKS.FLAGVALS[x] for x in p.ALL_TASKS.FLAGVALS})])
# Fixed Resize to input resolution
transforms_tr.extend([tr.FixedResize(resolutions={x: tuple(p.TRAIN.SCALE) for x in p.ALL_TASKS.FLAGVALS},
                                         flagvals={x: p.ALL_TASKS.FLAGVALS[x] for x in p.ALL_TASKS.FLAGVALS})])
transforms_tr.extend([tr.AddIgnoreRegions(), tr.ToTensor(),
                          tr.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])
transforms_tr = transforms.Compose(transforms_tr)

If you could tell me the right setting a bit more detailed, I would be really grateful.

SimonVandenhende · 2020-12-06T18:50:31Z

Hi. The augmentations used in this repo were implemented using the opencv (cv2) library.
The code excerpt above used the PIL library. So there is currently no support for the exact same implementation in this repository.
I will make some updates to the code repository this month, and will make sure to include it.

For the time being, I think you can use the train transforms I mentioned, and just add the ToTensor and Normalize operations.

kotetsu-n · 2020-12-07T17:56:09Z

Hi, thank you for your reply. I know it needs some modifications, so I tried to implement the transforms using PIL. But the result was not the same as yours. I will test using the setting you replied again. Also, I'm looking forward to seeing your updates!

SimonVandenhende · 2020-12-07T18:07:33Z

I will rerun the code myself to make sure. But will probably be towards the end of the month.
I have some other things that I need to take care of first.

SimonVandenhende · 2020-12-14T21:16:26Z

I am working to the fix the issue this week.

SimonVandenhende · 2020-12-16T12:14:36Z

I have made the code base consistent with the implementation used for the survey. The changes include the following:

Augmentations were adapted to use horizontal flips and random rescaling ([1.0, 1.2, 1.5]).
The depth is evaluated in a pixel-wise fashion, rather than by averaging per image as is the case in ASTMT.
The random rescaling operation also modifies the depth values. When zooming in, we divide the depth values by the scale.
The tasks are evaluated on the original NYUDv2 resolution. The data is made available through google drive, and is downloaded automatically when running the code for the first time.

At this point you should be able to get between 43.5 -- 44.0 MIoU using ResNet-50.
If you still wish to use the data loading from ASTMT, you should replace the nyu.py file.

TianhaoFu · 2020-12-17T09:17:17Z

Hi, Thanks again for your open-source code.

I'm running your new code.And It seems your google drive can not be open .The data cannot be download.

If you fix it, I would be really grateful.

SimonVandenhende · 2020-12-17T09:30:02Z

I see. I forgot to push the latest version of the nyud.py file. Should be fixed in the latest commit.
My apologies for the inconvenience.

SimonVandenhende · 2020-12-17T09:37:28Z

Let me know if it works out now :)

TianhaoFu · 2020-12-18T02:37:08Z

It works.

Thanks!

prismformore · 2020-12-21T08:44:03Z

I have made the code base consistent with the implementation used for the survey. The changes include the following:

Augmentations were adapted to use horizontal flips and random rescaling ([1.0, 1.2, 1.5]).

The depth is evaluated in a pixel-wise fashion, rather than by averaging per image as is the case in ASTMT.

The random rescaling operation also modifies the depth values. When zooming in, we divide the depth values by the scale.

The tasks are evaluated on the original NYUDv2 resolution. The data is made available through google drive, and is downloaded automatically when running the code for the first time.

At this point you should be able to get between 43.5 -- 44.0 MIoU using ResNet-50.
If you still wish to use the data loading from ASTMT, you should replace the nyu.py file.

May I know which config file should we use to achieve this result with resnet-50? It looks like there is no Mti-net config on resnet-50? Thank you very much for your help.

SimonVandenhende · 2020-12-21T11:10:54Z

I did not include the code for MTI-Net using a ResNet-50 backbone. Currently it only supports an HRNet backbone for MTI-Net.
However, you could include a ResNet-50 with feature pyramid network to get a multi-scale feature representation, from which MTI-Net can be run. The current code should give you the ResNet-50 results I included in the paper though for the encoder-based models, as I ran them using the same conditions.

prismformore · 2020-12-22T07:14:03Z

@SimonVandenhende Thank you!

TianhaoFu · 2020-12-23T03:18:41Z

Hi. I used the same hyperparameters as for the HRNet-18 models.
One thing you need to do is change the augmentations to make them consistent with the paper (see previous comments).
This should fix the issues I believe. I currently have no time to re-train the models myself. If the issue still persists after November 16th, I will consider retraining the bigger models and put them online as well.

Hi, I used batchsize=8 to train mti-net at 2 tasks(same hyperparameters as for the HRNet-18 models.). But I found the x_3_fpm['depth'].size() was [2, 384, 15, 20], In which the batchsize is equal to 2.not 8.

Could you explain it ?Thanks a lot !

SimonVandenhende · 2020-12-23T15:14:00Z

Hi. This could have to do with the specification of the number of backbone channels in utils/common_config.py, when adding the HRNet-48 backbone. You should make sure that this equals the number of channels that come out of the multi-scale feature representation generated by HRNet-48, which is different from the number of channels in the multi-scale feature representation from HRNet-18 (see line 34 in utils/common_config.py).

TianhaoFu · 2020-12-24T09:20:25Z

Hi. This could have to do with the specification of the number of backbone channels in utils/common_config.py, when adding the HRNet-48 backbone. You should make sure that this equals the number of channels that come out of the multi-scale feature representation generated by HRNet-48, which is different from the number of channels in the multi-sca

Hi. I used HRNet-48 channels == [48,96,192,384 ] to train my network. But I also came cross that problem.I think I set right channels

And the other problem is that:
I train HRNet-48 on four tasks .My batchsize==6,epoch==80 .And I used your new data. But my semseg mIOU is around 45-46, depth rmse is around 0.56-0.57.

In your paper your performance is mIOU==49,rmse==0.529. Could you please tell me where is the problem in my training procedure. I have tried really hard to train it.

Thank you so much! @SimonVandenhende

SimonVandenhende · 2021-01-03T08:50:24Z

I tried the experiment using HRNet-48. I got about 45.5 MIoU for the single-tasking model, while 47.0 MIoU for MTI-Net. The multi-task learning improvement was about 2.9 %. I think there are still some small differences with my old implementation used for the MTI-Net paper, which gave slightly better absolute numbers. Still, the conclusions from the paper are valid.

Also, I advise to use the current implementation as it is inline with the one used for the survey paper. This should give you a fair comparison between architectures, as I spend quite some time finetuning the hyperparameters for every method, while also making sure that other implementation details like augmentations, etc. where the same among different methods.
The current code base produces the results for the encoder-based approaches from the paper using ResNet-50/18, and for the decoder-based approaches using HRNet-18.

TianhaoFu · 2021-02-20T15:43:09Z

Hi, I noticed that in your latest HRnet-48 experiment, your MIoU is 47.0.
Then I wonder to know in that latest experiment, What is the value of the MTI-Net depth task rmse?
In addition, In you latest HRnet-48 experiment, does you train with 2 tasks and all 4 auxiliary task?

Thanks! @SimonVandenhende

SimonVandenhende mentioned this issue Dec 9, 2020

ResNet-50 single task baseline hyperparameters #5

Closed

SimonVandenhende closed this as completed Dec 16, 2020

Simon4Yan mentioned this issue Nov 22, 2021

Depth performance using ResNet-50 (Single-task performance) #23

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hyperparams for HRNet-48 #1

Hyperparams for HRNet-48 #1

sborse3 commented Oct 18, 2020

SimonVandenhende commented Oct 18, 2020 •

edited

sborse3 commented Oct 18, 2020 •

edited

SimonVandenhende commented Oct 19, 2020

flamehaze1115 commented Oct 22, 2020

SimonVandenhende commented Oct 22, 2020 •

edited

flamehaze1115 commented Oct 22, 2020

SimonVandenhende commented Oct 22, 2020

kotetsu-n commented Dec 5, 2020

SimonVandenhende commented Dec 6, 2020 •

edited

kotetsu-n commented Dec 7, 2020

SimonVandenhende commented Dec 7, 2020

SimonVandenhende commented Dec 14, 2020

SimonVandenhende commented Dec 16, 2020 •

edited

TianhaoFu commented Dec 17, 2020

SimonVandenhende commented Dec 17, 2020 •

edited

SimonVandenhende commented Dec 17, 2020

TianhaoFu commented Dec 18, 2020

prismformore commented Dec 21, 2020

SimonVandenhende commented Dec 21, 2020

prismformore commented Dec 22, 2020

TianhaoFu commented Dec 23, 2020

SimonVandenhende commented Dec 23, 2020

TianhaoFu commented Dec 24, 2020 •

edited

SimonVandenhende commented Jan 3, 2021

TianhaoFu commented Feb 20, 2021 •

edited

Hyperparams for HRNet-48 #1

Hyperparams for HRNet-48 #1

Comments

sborse3 commented Oct 18, 2020

SimonVandenhende commented Oct 18, 2020 • edited

sborse3 commented Oct 18, 2020 • edited

SimonVandenhende commented Oct 19, 2020

flamehaze1115 commented Oct 22, 2020

SimonVandenhende commented Oct 22, 2020 • edited

flamehaze1115 commented Oct 22, 2020

SimonVandenhende commented Oct 22, 2020

kotetsu-n commented Dec 5, 2020

SimonVandenhende commented Dec 6, 2020 • edited

kotetsu-n commented Dec 7, 2020

SimonVandenhende commented Dec 7, 2020

SimonVandenhende commented Dec 14, 2020

SimonVandenhende commented Dec 16, 2020 • edited

TianhaoFu commented Dec 17, 2020

SimonVandenhende commented Dec 17, 2020 • edited

SimonVandenhende commented Dec 17, 2020

TianhaoFu commented Dec 18, 2020

prismformore commented Dec 21, 2020

SimonVandenhende commented Dec 21, 2020

prismformore commented Dec 22, 2020

TianhaoFu commented Dec 23, 2020

SimonVandenhende commented Dec 23, 2020

TianhaoFu commented Dec 24, 2020 • edited

SimonVandenhende commented Jan 3, 2021

TianhaoFu commented Feb 20, 2021 • edited

SimonVandenhende commented Oct 18, 2020 •

edited

sborse3 commented Oct 18, 2020 •

edited

SimonVandenhende commented Oct 22, 2020 •

edited

SimonVandenhende commented Dec 6, 2020 •

edited

SimonVandenhende commented Dec 16, 2020 •

edited

SimonVandenhende commented Dec 17, 2020 •

edited

TianhaoFu commented Dec 24, 2020 •

edited

TianhaoFu commented Feb 20, 2021 •

edited