Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checkpoint is not found #237

Closed
Caioww opened this issue Aug 26, 2020 · 23 comments
Closed

Checkpoint is not found #237

Caioww opened this issue Aug 26, 2020 · 23 comments

Comments

@Caioww
Copy link

Caioww commented Aug 26, 2020

I am trying to run this command after I have trained the network but it is giving an error.

python -m keras_segmentation predict \
 --checkpoints_path="path_to_checkpoints" \
 --input_path="dataset1/images_prepped_test/" \
 --output_path="path_to_predictions"

File "C:\Users\Caiow\AppData\Local\Programs\Python\Python38\lib\site-packages\keras_segmentation\predict.py", line 175, in predict_multiple model = model_from_checkpoint_path(checkpoints_path) File "C:\Users\Caiow\AppData\Local\Programs\Python\Python38\lib\site-packages\keras_segmentation\predict.py", line 29, in model_from_checkpoint_path assert (latest_weights is not None), "Checkpoint not found." AssertionError: Checkpoint not found.

@rosegit-lab
Copy link

I have a similar issue. What is your saving weights format?

For my case, it saves as w.1.index instead of w.1.

@Caioww
Copy link
Author

Caioww commented Aug 27, 2020

I have a similar issue. What is your saving weights format?

For my case, it saves as w.1.index instead of w.1.

I am save as vgg_unet_1.1.index

@rosegit-lab
Copy link

The problem is coming from the weight extension, I think. Weight extension should be a number so that it is able to find the latest checkpoint while prediction.
I run this two months ago, everything was working well without any problem, but I think, after some update, somehow the saving format of the weights is corrupted. @divamgupta

@divamgupta
Copy link
Owner

Yes, this change is caused by newer versions of tf/keras. Downgrading should work. I will also push the update shortly

@Caioww
Copy link
Author

Caioww commented Aug 28, 2020

Yes, this change is caused by newer versions of tf/keras. Downgrading should work. I will also push the update shortly

I am using tensorflow version 2.2.0 , i can't downgrade using pip, because 2.2.0 is the last version

@rosegit-lab
Copy link

@Caioww you can try "pip install tensorflow==1.4" ( or any version you want)

@Caioww
Copy link
Author

Caioww commented Aug 29, 2020

@Caioww you can try "pip install tensorflow==1.4" ( or any version you want)

ERROR: Could not find a version that satisfies the requirement tensorflow==1.4 (from versions: 2.2.0rc1, 2.2.0rc2, 2.2.0rc3, 2.2.0rc4, 2.2.0, 2.3.0rc0, 2.3.0rc1, 2.3.0rc2, 2.3.0)
ERROR: No matching distribution found for tensorflow==1.4

@divamgupta
Copy link
Owner

Hi, this have been fixed now.
You can uninstall the existing keras_segmentation and install the master branch using : pip install git+https://github.com/divamgupta/image-segmentation-keras
That should solve the error.

@rosegit-lab
Copy link

It is working now, thank you!

@rosegit-lab
Copy link

Hi again @divamgupta,

I supposed that after your fixing, the problem is solved, but it didn't. It is because, after Keras update, two different files are generated (the weights' file is corrupted), the file which ending with '.index' is the wrong file, its size is so small that it not possible to be saved weights. So, even if the code seems like running without error, indeed it does not work.

@divamgupta
Copy link
Owner

Thanks, I am looking into the issue.

@divamgupta
Copy link
Owner

Something major did change after the keras update. The sample model with sample dataset gives very good accuracy with older keras/tf versions. But with the new versions the training accuracy is also very bad.

@rosegit-lab
Copy link

Which version of Keras and Tensorflow is working?

@divamgupta
Copy link
Owner

It works perfectly with TF 1.14.0 and keras 2.1.4

@divamgupta
Copy link
Owner

Its fixed now. Everything should work with latest TF/keras versions.
Here is a working google colab example: https://colab.research.google.com/drive/1Kpy4QGFZ2ZHm69mPfkmLSUes8kj6Bjyi?usp=sharing

@wassi12
Copy link

wassi12 commented Jul 11, 2021

Its fixed now. Everything should work with latest TF/keras versions.
Here is a working google colab example: https://colab.research.google.com/drive/1Kpy4QGFZ2ZHm69mPfkmLSUes8kj6Bjyi?usp=sharing

Hi Mr @divamgupta . The above mentioned link was working perfectly earlier. but now, it gives the error :
AttributeError: module 'keras.utils' has no attribute 'get_file'.
All my attempts to solve this did not work.
Please enlighten me on how to find the solution.
NB: On colab it is installed TF and keras :2.5.0
I have also tried other combinations that have not worked.
I will be very pleased with your prompt response to this situation.

@Ankit-Vohra
Copy link

@wassi12 were you able to solve the problem? Even I'm stuck on the same issue
It would be really helpful if @divamgupta you could share a solution for this

@eyildiz-ugoe
Copy link

eyildiz-ugoe commented Aug 15, 2022

Its fixed now. Everything should work with latest TF/keras versions. Here is a working google colab example: https://colab.research.google.com/drive/1Kpy4QGFZ2ZHm69mPfkmLSUes8kj6Bjyi?usp=sharing

No it's not fixed.

Tensorflow: 2.9.0 and here it is:


AssertionError Traceback (most recent call last)
Input In [16], in <cell line: 8>()
4 print("Tensorflow:", tf.version)
7 model_save_path = "./model/vgg_unet_2022-08-09_19:42:06"
----> 8 model = model_from_checkpoint_path(model_save_path)

File ~/anaconda3/envs/3dreconst/lib/python3.9/site-packages/keras_segmentation/predict.py:24, in model_from_checkpoint_path(checkpoints_path)
21 def model_from_checkpoint_path(checkpoints_path):
23 from .models.all_models import model_from_name
---> 24 assert (os.path.isfile(checkpoints_path+"_config.json")
25 ), "Checkpoint not found."
26 model_config = json.loads(
27 open(checkpoints_path+"_config.json", "r").read())
28 latest_weights = find_latest_checkpoint(checkpoints_path)

AssertionError: Checkpoint not found.

@GruMinion
Copy link

Similar error to me as of today

@hayatkhan8660-maker
Copy link

hayatkhan8660-maker commented Sep 11, 2022

Traceback (most recent call last):
File "test.py", line 46, in
out_dir="F:/Hayat Data/Hayat master's PC backup/drive volume G/image-segmentation-keras-master/fire_segmentation/shufflenet_unet_results/shufflenet_unet11/"
File "F:\Hayat Data\Hayat master's PC backup\drive volume G\image-segmentation-keras-master\fire_segmentation\keras_segmentation\predict.py", line 194, in predict_multiple
model = model_from_checkpoint_path(checkpoints_path)
File "F:\Hayat Data\Hayat master's PC backup\drive volume G\image-segmentation-keras-master\fire_segmentation\keras_segmentation\predict.py", line 29, in model_from_checkpoint_path
assert (latest_weights is not None), "Checkpoint not found."
AssertionError: Checkpoint not found.

I tried to reinstall keras_segmentation but still its not working..
Any help will be appreciated

@hayatkhan8660-maker
Copy link

I am trying to run this command after I have trained the network but it is giving an error.

python -m keras_segmentation predict \
 --checkpoints_path="path_to_checkpoints" \
 --input_path="dataset1/images_prepped_test/" \
 --output_path="path_to_predictions"

File "C:\Users\Caiow\AppData\Local\Programs\Python\Python38\lib\site-packages\keras_segmentation\predict.py", line 175, in predict_multiple model = model_from_checkpoint_path(checkpoints_path) File "C:\Users\Caiow\AppData\Local\Programs\Python\Python38\lib\site-packages\keras_segmentation\predict.py", line 29, in model_from_checkpoint_path assert (latest_weights is not None), "Checkpoint not found." AssertionError: Checkpoint not found.

Have u solved this problem ??

@hayatkhan8660-maker
Copy link

I solved it :) :)

@Xenon1997
Copy link

I edited the function find_latest_checkpoint(checkpoints_path, fail_safe=True) and it works:

def find_latest_checkpoint(checkpoints_path, fail_safe=True):

def get_epoch_number_from_path(path):
    **t_path = path.split("\\")[-1]**
    m_path = t_path.strip(".")
    return m_path
all_checkpoint_files = glob.glob(checkpoints_path + ".*")

all_checkpoint_files = [ ff.replace(".index" , "") for ff in all_checkpoint_files ]
# Filter out entries where the epoc_number part is pure number
all_checkpoint_files = list(filter(lambda f: get_epoch_number_from_path(f)
                                   .isdigit(), all_checkpoint_files))
print(all_checkpoint_files)
if not len(all_checkpoint_files):
    # The glob list is empty, don't have a checkpoints_path
    if not fail_safe:
        raise ValueError("Checkpoint path {0} invalid"
                         .format(checkpoints_path))
    else:
        return None

# Find the checkpoint file with the maximum epoch
latest_epoch_checkpoint = max(all_checkpoint_files,
                              key=lambda f:
                              int(get_epoch_number_from_path(f)))
return latest_epoch_checkpoint

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants