New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why some submissions are running or submitted forever? #572
Comments
@magrichard please try again and let me know if you experience any problems, you may need to create a new queue. By the way: any old compute workers will need to be updated, the BROKER_URL=amqp://user:pass@broker.codabench.org:9001/vhost
BROKER_USE_SSL=True |
Hi @ckcollab I made some tests this morning. I am afraid we are still facing the same issue. Submission#1281 was a successfull run (5 to 10 minutes). Sub#1289 and sub#1295, with .zip files that were previously successfull (see sub#1180 and sub#1185) are also stucked, with no possibility to access the log files (I get a blank field, when I click on the submission -> screenshot below). Also, please note that so far we are using the default queue, as the compute workers have not yet been set up by our partner from Heidelberg. Thanks for your help! |
Appreciate your patience, can you test again? One behavior I am experiencing is only your latest submission appears to update its status in real time. If you make many submissions, the older ones may appear to be stuck. A page refresh reveals the submissions true state.. made #576 for this Also made #577 to track the blank submission details dialog |
Hi Eric, do you mean we can't upload more than one submission at the same time? @ckcollab |
If I understand what you mean, we just need a page refresh and the submissions should have their status updated correctly? |
Hi, I am sorry, but I still have the feeling the problem is not solved. For instance, on competition #199, I launched a submission (id 1691) more 3 hours ago against a subset of tasks. Status indicates 'running' in the submission table and in the server_status webpage (https://www.codabench.org/server_status). However, in the server_status, I can see that none of the children submission has been launched. When I click on the submission in the regular interface, I get a blank field. |
Currently the compute workers have 30GB of space, I can try with larger workers -- how much space should I allocate? I tested this locally and ran into some storage space problems, allocated more to Docker and was able to execute this seemingly successfully (still processing) |
I've been running the large submission on my MacBook for a few hours now, just about done. Seems like it's going OK on my 1tb drive. The compute workers we have right now only have 30GB storage and run out of space during the submission, causing it to end up in weird states. Ashwini should have working compute workers with more resources probably early next week and this will likely resolve some problems. |
Resolving some issues with submission statuses here: We've made a few changes recently and I believe a few glitches should be resolved. Closing this for now, please re-open with additional details if you experience more problems. |
Dear all,
I have noticed something very strange to me, that is problematic in our implementation of the codabench benchmark (and blocking for the connexion with the meteor webapp).
I am currently testing things (with small-size datasets) using the competition #183
https://www.codabench.org/competitions/183/#/participate-tab
The same submission.zip file can be successfully run within few minutes, or stucked forever in 'submitted' or 'running' status. This happens as well when we make submission in bot mode.
I can't see any error message that would explain this behaviour.
Do you have an idea of why this is happening and what we should do to solve this issue?
We are currently trying to set up compute workers with our partner in Heidelberg but they currently struggle to get a public access to their machine. However, I am not sure that this would solve the problem.
Thanks for your input!
Magali
The text was updated successfully, but these errors were encountered: