Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to enable apex O1 for inferences on a non-apex FP32 trained model? #750

Open
Damiox opened this issue Mar 10, 2020 · 13 comments

Comments

@Damiox
Copy link

Damiox commented Mar 10, 2020

Can I use apex at inference time on a pure FP32 model that was not trained with apex?
Does apex require for inferences explicitly and emphatically a model that was initially trained with apex being enabled? It's not clear for me yet.

Could I get some explanation about that? I can't find the answer in the docs.

@Lornatang
Copy link

@Damiox
First, you must ensure that the 'optimizer' parameter exists in your model.
If you run the following code without any problems, you are successful.

# Initialization
opt_level = 'O2'  # for only use FP32
model, optimizer = amp.initialize(model, optimizer, opt_level=opt_level)

# Restore
model = ...
optimizer = ...
checkpoint = torch.load('checkpoint.pth')

model, optimizer = amp.initialize(model, optimizer, opt_level=opt_level)
model.load_state_dict(checkpoint['model'])
optimizer.load_state_dict(checkpoint['optimizer'])
amp.load_state_dict(checkpoint['amp'])

@Damiox
Copy link
Author

Damiox commented Mar 12, 2020

hey @Lornatang . This is my current code:

                try:
                    from apex import amp
                    from transformers.optimization import AdamW
                    optimizer = AdamW(self.model.parameters())
                    self.model, optimizer = amp.initialize(self.model, optimizer, opt_level='O1')
                except ImportError:
                    print("NVIDIA's apex library is not installed. Automatic Mixed Precision cannot be enabled.")

The optimizer I'm using is this https://huggingface.co/transformers/main_classes/optimizer_schedules.html

My questions below:

  • Why should I need to use O2 instead of O1 for my use case?
  • Why should I restore the checkpoint?
  • Isn't my code good enough for my use case? Please remember that my model has not been trained with mixed precision. It's fp32, but I'm just trying to use apex only at inference time. I cannot find anything in the documentation about exploring that option. Is it not expected?

@Lornatang
Copy link

@Damiox

  • I think your problem is that you need to train on double precision instead of dynamic precision. If you set O1, it will work better.
  • If you call the pre training model directly, you do not need to, otherwise you must specify the location of the model weight in your directory.
  • Your sample code is correct and can be used.

@Damiox
Copy link
Author

Damiox commented Mar 12, 2020

@Lornatang actually my problem is that I'm using a model that was not trained with mixed precision. It's fp32. I'm running inferences faster by using apex with O1 level at inference time for this model. I don't see much discrepancies, but I'm not sure whether what I do is right or not. I can't find in the documentation whether that's ok. Do you know where I can confirm that from?
Based on what you say then, training with mixed precision is not a requirement for using apex later for inferences? I can grab any fp32 model and run inferences with apex then, right? Thanks

@Lornatang
Copy link

@Damiox
Yes, you can

@Damiox
Copy link
Author

Damiox commented Mar 13, 2020

@Lornatang thank for helping me out with this. Could you please elaborate more in the reason why this should work? What's the theory behind? Thanks

@Lornatang
Copy link

@Damiox
Apex tool characteristics: Hybrid precision training + dynamic loss amplification.

1.The essence of mixed precision training lies in "using fp16 as storage and multiplication in memory to speed up calculation, using fp32 as accumulation to avoid rounding error". The strategy of hybrid precision training effectively alleviates the problem of rounding error.

2.Loss scaling uses mixed precision training, or it can't converge, because the value of activation gradient is too small, resulting in underflow. The idea of loss amplification is as follows:

  • Before back propagation, the loss change (dloss) is increased by 2 ^ k times manually, so the intermediate variable (activation function gradient) obtained during back propagation will not overflow;
  • After back propagation, the weight gradient will be reduced by times and return to normal value.

@Damiox
Copy link
Author

Damiox commented Mar 13, 2020

Just to be 100% on the same page here.
I am using an existing model for inferences that was initially trained with FP32 and without apex. I am using that model for inferences, not re-training it. I am using apex at inference time only to speed things up. I am not interested in anything about apex+training because I cannot re-train this model. Thanks

@Damiox
Copy link
Author

Damiox commented Mar 16, 2020

@Lornatang I'm sorry to ping you again, but just wanted to make sure you got my point clearly. Is it wrong to initialize apex for inferences on an existing FP32 model that I haven't re-trained with apex? Everywhere in the documentation it looks like it's assumed that the model is re-trained with apex, but I'm not re-training my model with apex, I'm just using apex when running predictions for my model. Just wanted to clarify it and get some feedback from you. Thanks!

@Lornatang
Copy link

Lornatang commented Mar 16, 2020

@Damiox
Sorry.
Apex reasoning can be done on any FP32 precision model.
You can try to load pytorch's vgg19 pre training model, which is trained with FP32. In the same way, he can initialize the model and make reasoning through the code I gave earlier.

@kwanUm
Copy link

kwanUm commented Sep 2, 2020

@Damiox I'm trying the same thing (trained fp32, inference with apex). Unfortunately I see inference time gets slower.. How much increase in speed have you observed using apex only at inference time?

@Damiox
Copy link
Author

Damiox commented Sep 2, 2020

@Damiox I'm trying the same thing (trained fp32, inference with apex). Unfortunately I see inference time gets slower.. How much increase in speed have you observed using apex only at inference time?

A considerable amount of time indeed. Take a look at your GPU. Not all GPUs take advantage of the FP16 operations. Tesla T4 indeed does take advantage of Apex. It's approximately 2x faster

@BuaaAlban
Copy link

@Damiox I'm trying the same thing (trained fp32, inference with apex). Unfortunately I see inference time gets slower.. How much increase in speed have you observed using apex only at inference time?

A considerable amount of time indeed. Take a look at your GPU. Not all GPUs take advantage of the FP16 operations. Tesla T4 indeed does take advantage of Apex. It's approximately 2x faster

I have run inference for a Fp32 model on TESLA T4, but I haven't got any speed up, how can I confirm that tensor Core is used? or can you help me ?
I have changed the model by

model = amp.initialize(model, opt_level='O3')
and changed the input of the model by
t_audio_signal_e=t_audio_signal_e.to(torch.half).cuda()

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants