Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CNN2 training error #33

Closed
ethanbass opened this issue Feb 1, 2024 · 2 comments
Closed

CNN2 training error #33

ethanbass opened this issue Feb 1, 2024 · 2 comments

Comments

@ethanbass
Copy link

ethanbass commented Feb 1, 2024

Hi,

I am trying to train a CNN2 model using some images I annotated and have been running into an error related to the sample weights.

The command I'm running is: amf train -net trained_networks/CNN2v1.h5 -s -o ~/Downloads/CNN2_model -CNN2 /Users/ethanbass/Pictures/mcrz_training_cropped_1500px/selected/CNN2_training/*.jpg.

The routine goes through the "tile extraction" OK, but then returns the following error:

[17:34:10] Class weights
    - ConvNet A: 969 active (weight: 0.93), 832 inactive (weight: 1.08).
    - ConvNet V: 965 active (weight: 0.93), 836 inactive (weight: 1.08).
    - ConvNet H: 690 active (weight: 1.31), 1111 inactive (weight: 0.81).
    - ConvNet I: 750 active (weight: 1.20), 1051 inactive (weight: 0.86).
Traceback (most recent call last):
  File "/opt/local/bin/amf", line 77, in <module>
    main()
  File "/opt/local/bin/amf", line 55, in main
    AmfTrain.run(input_files)
  File "/Users/ethanbass/software/amfinder-apple-silicon/amf/amfinder_train.py", line 597, in run
    his = model.fit(t_gen.flow(xt, yt, batch_size=bs),
  File "/Users/ethanbass/miniforge3/envs/amf6/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/Users/ethanbass/miniforge3/envs/amf6/lib/python3.9/site-packages/keras/engine/data_adapter.py", line 1424, in _make_class_weight_map_fn
    raise ValueError(error_msg)
ValueError: Expected `class_weight` to be a dict with keys from 0 to one less than the number of classes, found {'A': {0: 1.0823317307692308, 1: 0.9293085655314758}, 'V': {0: 1.0771531100478469, 1: 0.933160621761658}, 'H': {0: 0.8105310531053105, 1: 1.305072463768116}, 'I': {0: 0.8568030447193149, 1: 1.2006666666666668}}

I thought at first this might be related to the configuration of my python environment since I had to change the versions of some of the dependencies to get AMfinder installed on my M1 mac, but I am also getting the same error on a Windows 11 computer in the lab running a standard installation of amfinder on the Windows Subsystem for Linux.

I think the error may be related to this issue on the tensorflow github page (tensorflow/tensorflow#41448) and perhaps also to the issue mentioned in the documentation of the class_weights function:

    Tensorflow 2.1 (and Keras 2.3.1). A bug in TF makes it
    impossible to use class_weights to models with multiple
    outputs. This bug is active on January 2021.
    Reference: https://github.com/tensorflow/tensorflow/issues/40457

But if this is still an issue with tensorflow above 2.1, I'm not sure why the requirements of amfinder no longer reflect this? Probably there's something I'm missing. I would be grateful for any suggestions!

Ethan

Update: I was able to get this running on the WSL by reverting to the 2021 release of AMfinder (https://github.com/SchornacklabSLCU/amfinder/releases/tag/v2.0). So I guess it is indeed an issue with the tensorflow bug I mentioned. I don't think this will help me get it running on my mac unfortunately, since tensorflow 2.1 is not available (as far as I can tell) for the m1 architecture. If it is in fact the case that tensorflow 2.1 is required for CNN2 training, I think it would be helpful if this were reflected in the installation instructions. I'm also wondering if there might be some kind of workaround to get this working with newer versions of tensorflow? It seems like they are not going to fix this any time soon, given that people have been complaining about it for 3 years with very little response from the developers.

@EEvangelisti
Copy link
Collaborator

Hi,

Apologies for the late reply.
Indeed, this is a long-lasting bug with tensorflow. I will clarify the documentation.
Thanks for the feedback.

Best wishes,
Edouard

@EEvangelisti
Copy link
Collaborator

I assume the issue is now solved. Feel free to reopen it if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants