Train in half precission #257

hvico · 2021-08-26T19:09:53Z

Hello,

It would be great to have a training mode or arg which would enable FP16 mode, for enhanced performance on CUDA traininig.

I use this training code for regular hugginface training:

`scaler = torch.cuda.amp.GradScaler()

def train_fp16(epoch):
model.train()
for _,data in enumerate(training_loader, 0):
ids = data['ids'].to(device, dtype = torch.long)
mask = data['mask'].to(device, dtype = torch.long)
token_type_ids = data['token_type_ids'].to(device, dtype = torch.long)
targets = data['targets'].to(device, dtype = torch.float)

    with torch.cuda.amp.autocast():

        outputs = model(ids, mask, token_type_ids)

    optimizer.zero_grad()
    loss = loss_fn(outputs, targets)
    #if _%1000==0:
    #    print(f'Epoch: {epoch}, Loss:  {loss.item()}')
    
    optimizer.zero_grad()
    scaler.scale(loss).backward()
    scaler.step(optimizer)
    scaler.update()

def train_fp32(epoch):
model.train()
for _,data in enumerate(training_loader, 0):
ids = data['ids'].to(device, dtype = torch.long)
mask = data['mask'].to(device, dtype = torch.long)
token_type_ids = data['token_type_ids'].to(device, dtype = torch.long)
targets = data['targets'].to(device, dtype = torch.float)

    outputs = model(ids, mask, token_type_ids)

    optimizer.zero_grad()
    loss = loss_fn(outputs, targets)
    if _%1000==0:
        print(f'Epoch: {epoch}, Loss:  {loss.item()}')
    
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()`

Thanks!

The text was updated successfully, but these errors were encountered:

EricFillion · 2021-09-11T16:10:47Z

Thanks for the suggestion! I'll look into it. I agree with adding more features to reduce memory consumption.

EricFillion · 2021-11-19T06:07:48Z

Once again, thanks for this suggestion. I just published a new version that allows you to enable fp16.

EricFillion added the enhancement New feature or request label Sep 11, 2021

EricFillion mentioned this issue Nov 19, 2021

Added half precision training and increased version to 4.4.0 #273

Merged

EricFillion closed this as completed Nov 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train in half precission #257

Train in half precission #257

hvico commented Aug 26, 2021 •

edited

EricFillion commented Sep 11, 2021

EricFillion commented Nov 19, 2021

Train in half precission #257

Train in half precission #257

Comments

hvico commented Aug 26, 2021 • edited

EricFillion commented Sep 11, 2021

EricFillion commented Nov 19, 2021

hvico commented Aug 26, 2021 •

edited