You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have question about training spec of your model. I want to know about sequence length, batch size, training time, GPU type, # of GPU, # of training samples, and loss
You looks like acquire 3.7 loss. Could you describe the parameter of training to acquire those performance?
I'm sorry but I cannot remember the detailed training configurations for the example loss figure described in README:
But I can share the other training result with its configurations. It should be helpful!
Dataset
I constructed a custom Korean dataset collected from several platforms. The total size of the raw text file is about 30GB and it contains about 5.04B tokens.
The vocabulary size is 32000 and unk-ratio is 0.00005.
The number of tokens in each sequence is less than 512. (seq_len = 512)
Model
The model consists of 24 transformer-encoder layers and the dimensionality of hidden units is 1024. The total parameter size is 304M.
Environment
The model was trained for 8 epochs, on 2 x Tesla V100 GPUs.
I have question about training spec of your model. I want to know about sequence length, batch size, training time, GPU type, # of GPU, # of training samples, and loss
You looks like acquire 3.7 loss. Could you describe the parameter of training to acquire those performance?
GPT2/src/gpt2/train_model.py
Line 93 in 71ebf91
Are these parameters used to get the loss ?
The text was updated successfully, but these errors were encountered: