Use --resume, error:FileNotFoundError: [Errno 2] No such file or directory: '.cfg' #378

Joker9194 · 2021-12-07T01:28:55Z

I use the command: python train.py --device 0 --batch-size 4 --img 640 640 --cfg cfg/yolov4-pacsp-x-mish.cfg --data data/mydata.yaml --weight weights/yolov4-csp-x-mish.weights --name yolov4-pacsp-x-mish --resume, but had the error:Use --resume, error:FileNotFoundError: [Errno 2] No such file or directory: '.cfg'

So what is happen? My command is inaccurate?

Joker9194 · 2021-12-07T13:03:30Z

I see the code, why opt.cfg = ''?
if opt.resume: # resume an interrupted run
ckpt = opt.resume if isinstance(opt.resume, str) else get_latest_run() # specified or most recent path
assert os.path.isfile(ckpt), 'ERROR: --resume checkpoint does not exist'
with open(Path(ckpt).parent.parent / 'opt.yaml') as f:
opt = argparse.Namespace(**yaml.load(f, Loader=yaml.FullLoader)) # replace
opt.cfg, opt.weights, opt.resume = '', ckpt, True
logger.info('Resuming training from %s' % ckpt)

Grutschus · 2021-12-14T11:43:15Z

I ran into the same issue.

It appears, that the config file is needed regardless of whether a checkpoint is loaded:

pretrained = weights.endswith('.pt')
if pretrained:
    with torch_distributed_zero_first(rank):
        attempt_download(weights)  # download if not found locally
    ckpt = torch.load(weights, map_location=device)  # load checkpoint
    --> model = Darknet(opt.cfg).to(device)  # create
    state_dict = {
        k: v
        for k, v in ckpt['model'].items()
        if model.state_dict()[k].numel() == v.numel()
    }
    model.load_state_dict(state_dict, strict=False)
    print('Transferred %g/%g items from %s' %
          (len(state_dict), len(model.state_dict()), weights))  # report
else:
    model = Darknet(opt.cfg).to(device)  # create

To me it seems like, the plan originally was to save (and serialize) the config together with the model here:

        save = (not opt.nosave) or (final_epoch and not opt.evolve)
        if save:
            with open(results_file, 'r') as f:  # create checkpoint
                ckpt = {
                    'epoch':
                    epoch,
                    'best_fitness':
                    best_fitness,
                    'best_fitness_p':
                    best_fitness_p,
                    'best_fitness_r':
                    best_fitness_r,
                    'best_fitness_ap50':
                    best_fitness_ap50,
                    'best_fitness_ap':
                    best_fitness_ap,
                    'best_fitness_f':
                    best_fitness_f,
                    'training_results':
                    f.read(),
                    'model':
                    ema.ema.module.state_dict()
                    if hasattr(ema, 'module') else ema.ema.state_dict(),
                    'optimizer':
                    None if final_epoch else optimizer.state_dict(),
                    'wandb_id':
                    wandb_run.id if wandb else None
                }

This is similar to what has been done here

However, the config is not save in the ckpt at the moment, which is why the *.cfg file is needed. I feel like this is a good idea, though.

TLDR:
Config file is needed. Changing the commented line to the one below fixed the issue for me.

    # opt.cfg, opt.weights, opt.resume = '', ckpt, True
    opt.weights, opt.resume = ckpt, True

Joker9194 · 2021-12-20T07:03:47Z

Continue training with the following code? I will try it.
# opt.cfg, opt.weights, opt.resume = '', ckpt, True
opt.weights, opt.resume = ckpt, True

Grutschus · 2021-12-20T07:23:19Z

Continue training with the following code? I will try it. # opt.cfg, opt.weights, opt.resume = '', ckpt, True opt.weights, opt.resume = ckpt, True

At least that worked for me. Be aware though that once you've tried to continue training without that change to the code, the script will override your hyperparameters.yaml file. Therefore, the cfg entry (in the .yaml) will be empty (cfg=''). You might have to fix that too...

Joker9194 · 2021-12-20T07:25:29Z

At least that worked for me. Be aware though that once you've tried to continue training without that change to the code, the script will override your hyperparameters.yaml file. Therefore, the cfg entry (in the .yaml) will be empty (cfg=''). You might have to fix that too...

ok, thx.

YoungjaeDev · 2022-03-07T07:59:12Z

with open(save_dir / 'opt.yaml', 'w') as f:
        yaml.dump(vars(opt), f, sort_keys=False)

It doesn't look like the dump yaml is being written here.

Joker9194 · 2022-03-07T13:16:45Z

with open(save_dir / 'opt.yaml', 'w') as f:
        yaml.dump(vars(opt), f, sort_keys=False)
It doesn't look like the dump yaml is being written here.

where is the code?

YoungjaeDev · 2022-03-07T22:51:53Z

@Joker9194

PyTorch_YOLOv4/train.py

Lines 57 to 60 in eb5f166

    
           with open(save_dir / 'hyp.yaml', 'w') as f: 
        
               yaml.dump(hyp, f, sort_keys=False) 
        
           with open(save_dir / 'opt.yaml', 'w') as f: 
        
               yaml.dump(vars(opt), f, sort_keys=False)

Joker9194 · 2022-03-08T02:07:15Z

@Joker9194

PyTorch_YOLOv4/train.py

Lines 57 to 60 in eb5f166

with open(save_dir / 'hyp.yaml', 'w') as f:

yaml.dump(hyp, f, sort_keys=False)

with open(save_dir / 'opt.yaml', 'w') as f:

yaml.dump(vars(opt), f, sort_keys=False)

I check the run/train/name/ the opt.yaml has saved during training, but not loaded when use --resume

YoungjaeDev · 2022-03-08T02:08:27Z

Continue training with the following code? I will try it. # opt.cfg, opt.weights, opt.resume = '', ckpt, True opt.weights, opt.resume = ckpt, True

@Joker9194

Joker9194 · 2022-03-08T02:27:36Z

Continue training with the following code? I will try it. # opt.cfg, opt.weights, opt.resume = '', ckpt, True opt.weights, opt.resume = ckpt, True

@Joker9194

it not work for me

Joker9194 · 2022-03-19T14:42:01Z

I ran into the same issue.

It appears, that the config file is needed regardless of whether a checkpoint is loaded:

pretrained = weights.endswith('.pt')
if pretrained:
    with torch_distributed_zero_first(rank):
        attempt_download(weights)  # download if not found locally
    ckpt = torch.load(weights, map_location=device)  # load checkpoint
    --> model = Darknet(opt.cfg).to(device)  # create
    state_dict = {
        k: v
        for k, v in ckpt['model'].items()
        if model.state_dict()[k].numel() == v.numel()
    }
    model.load_state_dict(state_dict, strict=False)
    print('Transferred %g/%g items from %s' %
          (len(state_dict), len(model.state_dict()), weights))  # report
else:
    model = Darknet(opt.cfg).to(device)  # create

To me it seems like, the plan originally was to save (and serialize) the config together with the model here:

        save = (not opt.nosave) or (final_epoch and not opt.evolve)
        if save:
            with open(results_file, 'r') as f:  # create checkpoint
                ckpt = {
                    'epoch':
                    epoch,
                    'best_fitness':
                    best_fitness,
                    'best_fitness_p':
                    best_fitness_p,
                    'best_fitness_r':
                    best_fitness_r,
                    'best_fitness_ap50':
                    best_fitness_ap50,
                    'best_fitness_ap':
                    best_fitness_ap,
                    'best_fitness_f':
                    best_fitness_f,
                    'training_results':
                    f.read(),
                    'model':
                    ema.ema.module.state_dict()
                    if hasattr(ema, 'module') else ema.ema.state_dict(),
                    'optimizer':
                    None if final_epoch else optimizer.state_dict(),
                    'wandb_id':
                    wandb_run.id if wandb else None
                }

This is similar to what has been done here

However, the config is not save in the ckpt at the moment, which is why the *.cfg file is needed. I feel like this is a good idea, though.

TLDR: Config file is needed. Changing the commented line to the one below fixed the issue for me.

    # opt.cfg, opt.weights, opt.resume = '', ckpt, True
    opt.weights, opt.resume = ckpt, True

I check the code, it word to me. Because in the code
https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/eb5f1663ed0743660b8aa749a43f35f505baa325/train.py#L500-501
the new opt replace the old opt and new opt.cfg always exists. i think the code is a bug?

thinktu2 · 2023-02-25T02:18:28Z

This problem is caused by cfg field in opt.yaml not loading correctly, I think.
To resolve it, pass --cfg /path/to/cfg param in shell and modify code in train.py from line 497.

# Resume
if opt.resume:  # resume an interrupted run
    ckpt = opt.resume if isinstance(opt.resume, str) else get_latest_run()  # specified or most recent path
    assert os.path.isfile(ckpt), 'ERROR: --resume checkpoint does not exist'
    cfg = opt.cfg if opt.cfg is not None else ''  ###################ADD#######################
    with open(Path(ckpt).parent.parent / 'opt.yaml') as f:
        opt = argparse.Namespace(**yaml.load(f, Loader=yaml.FullLoader))  # replace
    opt.cfg, opt.weights, opt.resume = cfg, ckpt, True ###################CHANGE#######################
    logger.info('Resuming training from %s' % ckpt)

It works for me, anyway.

Grutschus mentioned this issue Dec 20, 2021

FileNotFoundError: [Errno 2] No such file or directory: '.cfg' #353

Open

Joker9194 closed this as completed Mar 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use --resume, error:FileNotFoundError: [Errno 2] No such file or directory: '.cfg' #378

Use --resume, error:FileNotFoundError: [Errno 2] No such file or directory: '.cfg' #378

Joker9194 commented Dec 7, 2021

Joker9194 commented Dec 7, 2021

Grutschus commented Dec 14, 2021 •

edited

Loading

Joker9194 commented Dec 20, 2021 •

edited

Loading

Grutschus commented Dec 20, 2021

Joker9194 commented Dec 20, 2021

YoungjaeDev commented Mar 7, 2022

Joker9194 commented Mar 7, 2022

YoungjaeDev commented Mar 7, 2022

Joker9194 commented Mar 8, 2022

YoungjaeDev commented Mar 8, 2022

Joker9194 commented Mar 8, 2022

Joker9194 commented Mar 19, 2022

thinktu2 commented Feb 25, 2023

Use --resume, error:FileNotFoundError: [Errno 2] No such file or directory: '.cfg' #378

Use --resume, error:FileNotFoundError: [Errno 2] No such file or directory: '.cfg' #378

Comments

Joker9194 commented Dec 7, 2021

Joker9194 commented Dec 7, 2021

Grutschus commented Dec 14, 2021 • edited Loading

Joker9194 commented Dec 20, 2021 • edited Loading

Grutschus commented Dec 20, 2021

Joker9194 commented Dec 20, 2021

YoungjaeDev commented Mar 7, 2022

Joker9194 commented Mar 7, 2022

YoungjaeDev commented Mar 7, 2022

Joker9194 commented Mar 8, 2022

YoungjaeDev commented Mar 8, 2022

Joker9194 commented Mar 8, 2022

Joker9194 commented Mar 19, 2022

thinktu2 commented Feb 25, 2023

Grutschus commented Dec 14, 2021 •

edited

Loading

Joker9194 commented Dec 20, 2021 •

edited

Loading