Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can you give me some details about files? #6

Closed
hufflepoohpooh opened this issue Jan 29, 2019 · 1 comment
Closed

Can you give me some details about files? #6

hufflepoohpooh opened this issue Jan 29, 2019 · 1 comment

Comments

@hufflepoohpooh
Copy link

hufflepoohpooh commented Jan 29, 2019

Thank you for your great code. I'm a student and a beginner of data analysis.
I want to executive your code but I have some questions. It may be a silly question, but can you give me some details about files?


python pretrain.py
--train_cfg config/pretrain.json
--model_cfg config/bert_base.json
--data_file $DATA_FILE
--vocab $BERT_PRETRAIN/vocab.txt
--save_dir $SAVE_DIR
--max_len 512
--max_pred 20
--mask_prob 0.15

  1. config/pretrain.json
  2. config/bert_base.json
  3. $DATA_FILE
  4. $BERT_PRETRAIN/vocab.txt

We need a $DATA_FILE as a train set, but what is vocab.txt? I can get the vocab.txt file from google's github. Just use it? or Can I customize it?(Because I want to make a bert which has lower parameters than BERT-BASE.)
Also, the ouput file model_steps_xxxx.pt is compatible with BERT in google's github?

Sorry I am not an expert, so maybe my questions are so silly. Thank you.

@dhlee347
Copy link
Owner

dhlee347 commented Apr 2, 2019

Sorry for the late response,

  1. You can use vocab.txt from Google BERT's repo. It's risky to modify vocab.txt because it was learned from a corpus.
  2. the output file is not compatible with google's code.

@dhlee347 dhlee347 closed this as completed Apr 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants