Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training spec #6

Closed
jisngprk opened this issue Feb 24, 2021 · 2 comments
Closed

Training spec #6

jisngprk opened this issue Feb 24, 2021 · 2 comments

Comments

@jisngprk
Copy link

jisngprk commented Feb 24, 2021

I have question about training spec of your model. I want to know about sequence length, batch size, training time, GPU type, # of GPU, # of training samples, and loss
You looks like acquire 3.7 loss. Could you describe the parameter of training to acquire those performance?

def add_subparser(subparsers: argparse._SubParsersAction):

Are these parameters used to get the loss ?

@affjljoo3581
Copy link
Owner

I'm sorry but I cannot remember the detailed training configurations for the example loss figure described in README:

figure

But I can share the other training result with its configurations. It should be helpful!

Dataset

  • I constructed a custom Korean dataset collected from several platforms. The total size of the raw text file is about 30GB and it contains about 5.04B tokens.
  • The vocabulary size is 32000 and unk-ratio is 0.00005.
  • The number of tokens in each sequence is less than 512. (seq_len = 512)

Model

  • The model consists of 24 transformer-encoder layers and the dimensionality of hidden units is 1024. The total parameter size is 304M.

Environment

  • The model was trained for 8 epochs, on 2 x Tesla V100 GPUs.
  • The entire training spent about 24 days.

Result

  • test loss: 3.2398
  • test perplexity: 25.5819

@jisngprk
Copy link
Author

jisngprk commented Mar 8, 2021

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants