-
Notifications
You must be signed in to change notification settings - Fork 857
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding capability to start a training from model checkpoint instead of doing it from scratch #297
base: main
Are you sure you want to change the base?
Conversation
Add train from checkpoint capability
@karolzak thanks for your good work, I want to fine-tune the model. But I could not find how to do it? could you please let me know how to use your work? thanks |
Thanks @Abbsalehi ! More specifically you need to add a variable like below: In my trials, I created a new config called
After you create your new config you can run something like this to kick off the training:
When this new variable is present in the config, the training script will try to instantiate the model from a previously trained model checkpoint. In my trials I just used |
@karolzak thanks a lot for your helpful response. In the Readme file, it says to provide the below directories, how many images did you put in these folders as I do not have many images? Readme: You need to prepare the following image folders:$ ls my_dataset |
I followed the recommendation from the docs but I'm not sure if this is necessarily needed. I'm not aware if this is coming from some hardcoded checks or is it more as a "for best performance" kind of suggestion. I would suggest to try with however many images you have and see what happens |
Thanks @karolzak, I could start training the model. However, I am wondering if is it possible to use multi-GPU to accelerate the process. |
@karolzak could you please help me to understand the below table from the result of one epoch validation? I do not understand "std" is calculated from which metric? Why some values are NaN? and what are the percentage ranges in the first column?
|
Hey guys, @karolzak, @Abbsalehi! Could you please provide a link for the "e-commerce" dataset in described in the blog? The provided link in Kaggle does not seem to exist anymore :( |
Small change introducing the option to provide a path (through location config) for model checkpoint to be used to load weights before starting a new training. I used this with success for finetuning LaMa model to my custom dataset.
CC: @senya-ashukha @cohimame