got error when I turn on mpp (Masked Patch Prediction) #18

GewelsJI · 2021-08-18T14:12:06Z

Hi, @dandelin

When I turn on the mpp (Masked Patch Prediction), I get this error:

AttributeError: 'VisionTransformer' object has no attribute 'mask_token'

The above error is appear in vision_transformer.py. Could you please tell me how to address it?

Thank you for your help.

Best regards,
Ge-Peng.

dandelin · 2021-08-18T14:20:52Z

Hi @GewelsJI

I've removed MPP-related features from the model code, and the left MPP-related codes are legacies.

If you want to test MPP, adding the following code at https://github.com/dandelin/ViLT/blob/master/vilt/modules/vision_transformer.py#L507 might solve the problem.

if config is not None and config["loss_names"]["mpp"] > 0:
        self.mask_token = nn.Parameter(torch.zeros(1, 1, embed_dim))
        trunc_normal_(self.mask_token, std=0.02)

GewelsJI · 2021-08-19T01:38:50Z

Yes. It works. Thank you for your help.

GewelsJI · 2021-08-23T08:59:21Z

Hi @dandelin,

Sorry to bother you again. Could you please illustrate the usage of parameters in this line:

python run.py with data_root=<ARROW_ROOT> num_gpus=<NUM_GPUS> num_nodes=<NUM_NODES> task_mlm_itm whole_word_masking=True step200k per_gpu_batchsize=<BS_FITS_YOUR_GPU>

I have no idea to adjust the num_nodes and per_gpu_batchsize in my cluster.

Thank you again.

Best regards,
Ge-Peng.

dandelin · 2021-08-24T04:44:42Z

Hi @GewelsJI

The batch_size in the config file controls the total batch size that a single optimization update uses.

The training script automatically uses gradient accumulation when it is not able to run full batch size at once. And the actual batch size per single iteration is calculated here: _config["per_gpu_batchsize"] * num_gpus * _config["num_nodes"].

So the num_nodes is the number of machines, num_gpus is the number of GPU cards in each machine, and per_gpu_batchsize is the batch size assigned to a single GPU.

GewelsJI · 2021-08-24T06:50:13Z

Ok, I get it. Thank you again.

Best,
Ge-Peng.

GewelsJI · 2021-08-31T03:01:00Z

Hi, @dandelin

I still have some questions about the code block in the objectives.py file:

What is the difference between compute_mpp (line), compute_mppd, and compute_mpfr? Could you tell me the exact function and usage of those functions?

Thank you again.

Best.

dandelin · 2021-08-31T06:34:39Z

Hi, @GewelsJI

Those two objectives are also legacies like MPP.
Where MPP predicts the mean RGB value through classification, MPPD tries to regress the value.
MPFR is similar to MPPD but tries to regress the patch embedding value (the value from the patch projection layer).

GewelsJI · 2021-08-31T08:02:31Z

Get it! Thank you!

GewelsJI closed this as completed Sep 7, 2021

dandelin mentioned this issue Nov 24, 2021

self.mask_token at 553 line in vision_transformer.py #35

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

got error when I turn on mpp (Masked Patch Prediction) #18

got error when I turn on mpp (Masked Patch Prediction) #18

GewelsJI commented Aug 18, 2021

dandelin commented Aug 18, 2021

GewelsJI commented Aug 19, 2021

GewelsJI commented Aug 23, 2021

dandelin commented Aug 24, 2021

GewelsJI commented Aug 24, 2021

GewelsJI commented Aug 31, 2021

dandelin commented Aug 31, 2021

GewelsJI commented Aug 31, 2021

got error when I turn on mpp (Masked Patch Prediction) #18

got error when I turn on mpp (Masked Patch Prediction) #18

Comments

GewelsJI commented Aug 18, 2021

dandelin commented Aug 18, 2021

GewelsJI commented Aug 19, 2021

GewelsJI commented Aug 23, 2021

dandelin commented Aug 24, 2021

GewelsJI commented Aug 24, 2021

GewelsJI commented Aug 31, 2021

dandelin commented Aug 31, 2021

GewelsJI commented Aug 31, 2021