Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION]When I train my COMET model, I have the following problem when I am almost successful, it seems to be stuck #152

Closed
Winsome-A opened this issue Jul 20, 2023 · 2 comments
Labels
question Further information is requested

Comments

@Winsome-A
Copy link

❓ Questions and Help

When I train my COMET model, I have the following problem when I am almost successful, it seems to be stuck
Here is my training command:
CUDA_VISIBLE_DEVICES=0 comet-train --cfg /home/xusongcheng/COMET-master/configs/models/referenceless_model.yaml
This is the last part on Xshell after my command and it's stuck here!
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

| Name | Type | Params

0 | encoder | XLMREncoder | 558 M
1 | layerwise_attention | LayerwiseAttention | 26
2 | train_metrics | RegressionMetrics | 0
3 | val_metrics | ModuleList | 0
4 | estimator | FeedForward | 10.5 M

10.5 M Trainable params
558 M Non-trainable params
569 M Total params
1,138.661 Total estimated model params size (MB)
Sanity Checking: 0it [00:00, ?it/s]
c8eb7dca18e1f9bca663377397c4aec
I added “--num_workers 0” to the command is invalid
How should I solve it?
best wishes
Winsome

@Winsome-A Winsome-A added the question Further information is requested label Jul 20, 2023
@ricardorei
Copy link
Collaborator

Hi @Winsome-A I have never seen this error maybe you can provide extra information? what pytorch-lightning version are you using?

@Winsome-A
Copy link
Author

Oh yeah .Thanks for your email. I have upgraded my cuda&cudnn version and eventually , the problem went away.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants