-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
training does not work #18
Comments
Hey @listener17 , sorry about that! I'll look into it this week. For now, can you try launching via torchrun --nproc_per_node 1 scripts/train.py --args.load conf/ablations/baseline.yml --save_path runs/baseline/ Just curious if that works. |
@pseeth: thanks.
|
@pseeth:
I get this error:
|
@pseeth:
|
On a different GPU server, I'm getting similar but different error message at the same place
|
FYI: BUT, if I add:
at the top of https://github.com/descriptinc/descript-audio-codec/blob/main/tests/test_train.py I tried the same trick with train.py, but still the training does not work! |
I created a clean conda environment, followed your installation steps, and ... it was not working. However, by luck, the training was working on my colleague's (unclean) environment. |
Can you share the environment and reopen the issue? We're hitting the same thing. Edit: Colleague says adding the following fixed it:
|
Hi all:
Did anyone manage to start the training?
If yes, could you please share your environment?
I created a separate virtual environment (Python 3.10.11). I'm using CUDA Version: 11.4; Ubuntu 20.04.2 LTS.
I followed all the instruction.
pip install git+https://github.com/descriptinc/descript-audio-codec
Encoding + decoding works!
Then did the training pre-requisites step:
pip install -e ".[dev]"
When I start training:
It's stuck for a long time displaying (below) and then exits!
Any idea why training is not working?
The text was updated successfully, but these errors were encountered: