-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runs stuck in progress #2995
Comments
Hey @Laiaborrell! Thanks a lot for the report, that seems kinda strange, as there's no scenario that runs can reactivate by themselves. My only guess is that the runs were tryed to be deleted, and something went wrong in the process of deletion, that's why it's showing that the runs are not found. As |
Hey @mihran113, thanks for your reply!! I checked and the hashes for the runs are still in the |
Hi @Laiaborrell did you manage to solve this? Because Im having the same issue using langchain callbacks. |
Hello @Maximiliano-Villanueva, I didn't manage to solve it. I had to relaunch the hyperparameter search.... sorry about that. Hope that someone else can help, it would be helpful for any future issues like this. |
@mihran113 Do you know if there is any update about this? I also see the run in meta/chunks and I am not able to delete the runs as they appear online on the UI. It looks like restarting the server fixes the issue, I hope this is a useful piece of information in fixing it! Would it be possible to perhaps add a "force delete" button to force deletion of running runs? ETA: when restarting the server some runs will not be deleted as they are "locked" |
@Michael-Tanzer Can I ask you to share the logs from |
I have now deleted the problematic runs by deleting the lock manually and then deleting from the UI. I will share a log as soon as it happens again. |
Let me know when that happens again, as it's pretty hard to reproduce, but the error should tell a lot about what's happening and it would help a lot. |
Regarding the force delete, we'll consider to implement it for the next minor version: |
When running
![image](https://private-user-images.githubusercontent.com/78802600/268623062-2957b7c0-ace8-41fa-b2ea-b12ea100cc89.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjA2Nzg5MjIsIm5iZiI6MTcyMDY3ODYyMiwicGF0aCI6Ii83ODgwMjYwMC8yNjg2MjMwNjItMjk1N2I3YzAtYWNlOC00MWZhLWIyZWEtYjEyZWExMDBjYzg5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MTElMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzExVDA2MTcwMlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTE1ZGUwYzRjNDQxMDBlZWUyMTFkYWQxM2I4M2VlZjZjYzU4YzZkY2YwMWM1ZmFkY2Q3MWFiZjhjYTdjMjZlMmImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.lC6lT0uKojxls0B6EE35mBSTN6TC_ii5W-uvhUv0BGQ)
![image](https://private-user-images.githubusercontent.com/78802600/268623252-b86495dc-911d-40d0-a1e2-8558e66af37e.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjA2Nzg5MjIsIm5iZiI6MTcyMDY3ODYyMiwicGF0aCI6Ii83ODgwMjYwMC8yNjg2MjMyNTItYjg2NDk1ZGMtOTExZC00MGQwLWExZTItODU1OGU2NmFmMzdlLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MTElMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzExVDA2MTcwMlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWExYzcxMDgyMDM3M2ZmNzg1NDdhYjk0ZmY4NjVkY2NlODNlMDU5MzYxY2NiMDdiOTBkNTE5ZjAwOWVjZGVkN2QmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.DXwqOTbrlhq3wYmBR-hhzCKGegQGKmyETeGgFLJOa3Q)
aim up..
and checking the runs in the UI looks like 4 out of 5 runs (except for the last one trained, which is tagged as "finished"), which have already finished training, are stuck "in progress" (green dots):A couple of days ago, when I last checked the state of the trainings, those runs were alredy tagged as finished but somehow they were reactivated now...
Because of this, when accessing these runs to check for the metrics and figures, a pop up with the following message appears "Error. Run not found":
Note that no error is printed in the terminal where the aim up command is being run.
I would really appreciate any help,
thanks!
The text was updated successfully, but these errors were encountered: