Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

got error when I turn on mpp (Masked Patch Prediction) #18

Closed
GewelsJI opened this issue Aug 18, 2021 · 8 comments
Closed

got error when I turn on mpp (Masked Patch Prediction) #18

GewelsJI opened this issue Aug 18, 2021 · 8 comments

Comments

@GewelsJI
Copy link

Hi, @dandelin

When I turn on the mpp (Masked Patch Prediction), I get this error:

AttributeError: 'VisionTransformer' object has no attribute 'mask_token'

The above error is appear in vision_transformer.py. Could you please tell me how to address it?

Thank you for your help.

Best regards,
Ge-Peng.

@dandelin
Copy link
Owner

Hi @GewelsJI

I've removed MPP-related features from the model code, and the left MPP-related codes are legacies.

If you want to test MPP, adding the following code at https://github.com/dandelin/ViLT/blob/master/vilt/modules/vision_transformer.py#L507 might solve the problem.

if config is not None and config["loss_names"]["mpp"] > 0:
        self.mask_token = nn.Parameter(torch.zeros(1, 1, embed_dim))
        trunc_normal_(self.mask_token, std=0.02)

@GewelsJI
Copy link
Author

Yes. It works. Thank you for your help.

@GewelsJI
Copy link
Author

Hi @dandelin,

Sorry to bother you again. Could you please illustrate the usage of parameters in this line:

python run.py with data_root=<ARROW_ROOT> num_gpus=<NUM_GPUS> num_nodes=<NUM_NODES> task_mlm_itm whole_word_masking=True step200k per_gpu_batchsize=<BS_FITS_YOUR_GPU>

I have no idea to adjust the num_nodes and per_gpu_batchsize in my cluster.

Thank you again.

Best regards,
Ge-Peng.

@dandelin
Copy link
Owner

Hi @GewelsJI

The batch_size in the config file controls the total batch size that a single optimization update uses.

The training script automatically uses gradient accumulation when it is not able to run full batch size at once. And the actual batch size per single iteration is calculated here: _config["per_gpu_batchsize"] * num_gpus * _config["num_nodes"].

So the num_nodes is the number of machines, num_gpus is the number of GPU cards in each machine, and per_gpu_batchsize is the batch size assigned to a single GPU.

@GewelsJI
Copy link
Author

Ok, I get it. Thank you again.

Best,
Ge-Peng.

@GewelsJI
Copy link
Author

Hi, @dandelin

I still have some questions about the code block in the objectives.py file:

What is the difference between compute_mpp (line), compute_mppd, and compute_mpfr? Could you tell me the exact function and usage of those functions?

Thank you again.

Best.

@dandelin
Copy link
Owner

Hi, @GewelsJI

Those two objectives are also legacies like MPP.
Where MPP predicts the mean RGB value through classification, MPPD tries to regress the value.
MPFR is similar to MPPD but tries to regress the patch embedding value (the value from the patch projection layer).

@GewelsJI
Copy link
Author

Get it! Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants