New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Train remotely from a training API #139
Comments
Hi there! Thanks for filing a bug report :) First of all, I would never recommend calling What seems to happen is that when executing remotely, ClearML will try to recognize and interpret your local environment, so it can be recreated on the remote machine. But since you're running it from the API, all the API requirements are also detected as dependencies of the task itself :) That said, I think I see what you're trying to do. I would go for the following flow instead:
Now your task should be in the queue, the API can return successfully and a worker can start working on it. You'll need a second API endpoint that you can poll every 1min for example, that a client can use to get the status. You can of course also just return the task_id in the first API when it's created, so a client can ask updates by asking the clearml server directly. Does this help? |
@thepycoder Good idea! But, how can I get that "first" training that will be used as a template? I have carried out local YOLO trainings (working correctly without API) but they don't work as a template :( When I try to clone and run them in my agent queue from app.clearml the console says some files are not found, the Thanks for the quick reply! |
Like you were thinking yourself, the agent indeed needs to be able to access your
In this way, the agent is no different than e.g. a colab instance, you'll have to give it a way to access your files :) |
Hi, I've been using ClearML and I've been tracking my yolov8 trainings which were carried out locally. To do so, I build an API with Fast API which can launch a training with custom hyperparams.
Now, I'm trying to do so remotely. So I build an agent and a queue (
cola_yolo8
) with that agent.However, if I add
execute_remotely()
in the API training method:This error shows up in the clearML task console:
I run this command to run the training API:
uvicorn main:app --host 0.0.0.0 --port 3000 --reload
I've tried simpler tasks instead of yolov8 training and the error is the same.
Thanks!
The text was updated successfully, but these errors were encountered: