Is it possible to train BERT? #3

codertimo · 2018-10-17T13:16:44Z

Is it possible to achieve the same result as the paper in short time?
Well.. I don't have enough GPU & computation power to see the enough result as google ai.

If we can't train the full corpus as the google, then how can we prove that this code is verified?
Training 256M size corpus without Google AI class gpu computation is nearly, impossible for me.

If you have any thought(reducing the model size) please let me know!

briandw · 2018-10-22T17:40:31Z

The authors plan on releasing the full pre-trained model in a few weeks. There will be the task of loading their model weights into PyTorch. Perhaps ONNX will work for getting the weights out of TF and into PT?

Once the weights have been loaded, it should be possible to validate the finetuneing results.

codertimo · 2018-10-23T01:03:40Z

@briandw Well I sent the email to author, and they noticed me the same thing.
Well I agree that we can generate the pytorch module using ONNX, but it might be impossible to load weight on this model as same as tf model architecture. So do you have any idea about this?

briandw · 2018-10-23T17:23:42Z

I can try to import the Tensor2tensor model into PT. https://github.com/tensorflow/tensor2tensor
It should be the same process.

briandw · 2018-10-23T17:53:48Z

@codertimo Should the goal be to train BERT from scratch or to fine-tune the model? I'd say that scratch training isn't realistic right now. Fine-tuneing shouldn't be that resource intense and will be very valuable.

codertimo · 2018-10-24T01:08:02Z

@briandw Thank you for your advice. Currently my goal is training from the scratch with smaller model which can available to train on our GPU environment. Cause I wanna keep this implementation for someone need training on there specific domain or language.

But as you said, moving trained model on tf to pytorch is another goal of this project too. So I liked to implement the transfer code for loading pretrained model too. Well I'll make a plan and notice you guys when the pretrained model and official BERT implementation is came out.

jacobrxz · 2018-12-19T09:06:42Z

Is this code support distributed training? I mean multi-computer with multi-gpu...

BerenLuthien · 2019-02-19T06:13:54Z

@codertimo Did you already trained this model on small dataset ? If yes, would you share some info about it ? For example, what if we use p2.8xlarge GPUs to train on 1M dataset from scratch (Thanks for wonderful work BTW)

codertimo added help wanted Extra attention is needed question Further information is requested labels Oct 17, 2018

codertimo self-assigned this Oct 17, 2018

codertimo mentioned this issue Oct 18, 2018

Is there any result? #4

Closed

codertimo changed the title ~~architecture & training process verification with fully training~~ Is it possible to train BERT? Oct 23, 2018

codertimo mentioned this issue Oct 24, 2018

Pretrained model transfer to pytorch #34

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to train BERT? #3

Is it possible to train BERT? #3

codertimo commented Oct 17, 2018 •

edited

briandw commented Oct 22, 2018 •

edited

codertimo commented Oct 23, 2018

briandw commented Oct 23, 2018

briandw commented Oct 23, 2018

codertimo commented Oct 24, 2018

jacobrxz commented Dec 19, 2018

BerenLuthien commented Feb 19, 2019

Is it possible to train BERT? #3

Is it possible to train BERT? #3

Comments

codertimo commented Oct 17, 2018 • edited

briandw commented Oct 22, 2018 • edited

codertimo commented Oct 23, 2018

briandw commented Oct 23, 2018

briandw commented Oct 23, 2018

codertimo commented Oct 24, 2018

jacobrxz commented Dec 19, 2018

BerenLuthien commented Feb 19, 2019

codertimo commented Oct 17, 2018 •

edited

briandw commented Oct 22, 2018 •

edited