How about your training time per epoch ?

I reproduce the data preprocess ，and then train the model with electra-large-discriminator plm / msde local_and_nonlocal strategy.
I found it takes me around 50 min per epoch  on a Tesla V100 32G with the same hyper-parameters on paper
Besides, I do some modify to use the DDP with 4 GPU, but the time only reduce to 40 min per epoch
Is the same for your training time ？
I want to  do some experiment with the LGESQL basemodel, but the time counsuming is .....[SAD]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How about your training time per epoch ? #14

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How about your training time per epoch ? #14

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions