Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue when fine-tuning Albert - Resource localhost/_0_SentencepieceOp/N10tensorflow4text12_GLOBAL__N_121SentencepieceResourceE does not exist. #1573

Open
deathsaber opened this issue Apr 10, 2024 · 3 comments
Assignees
Labels

Comments

@deathsaber
Copy link

deathsaber commented Apr 10, 2024

Describe the bug
I am fine-tuning the Keras implementation of Albert for my dataset for a classification problem by following the documentations present here - https://keras.io/api/keras_nlp/models/albert/albert_classifier/

The gist of how I am creating the model and fitting it is given below:

model = keras_nlp.models.AlbertClassifier.from_preset(
           'albert_extra_extra_large_en_uncased',
            preprocessor=keras_nlp.models.AlbertPreprocessor.from_preset(
            'albert_extra_extra_large_en_uncased',
            sequence_length=128,
        ),
            num_classes=4,
            load_weights=True,
            activation='softmax',
        )
model.backbone.trainable = False
model.fit(x=train_x, y=train_y)

When the fitting process runs, it always errors out with the stacktrace shown below. I am suspecting it must be related to SentencePiece not getting detected somehow based on the error message.

What might be doing wrong here? Am I missing something obvious?

2024-04-11 04:32:58.042517: I tensorflow/core/framework/local_rendezvous.cc:421] Local rendezvous recv item cancelled. Key hash: 5086726160540042098
Traceback (most recent call last):
  File "/mnt/d/bot_projects/incident-classification/train.py", line 27, in <module>
    albert.AlBERTClassifier().train()
  File "/mnt/d/bot_projects/incident-classification/models/albert.py", line 67, in train
    stats = self.model.fit(x=x.tolist(), y=y.tolist(), validation_data=(t_x.tolist(), t_y.tolist()), batch_size=self.conf['modelParams']['albert']['batchSize'],
  File "/root/classifier/.venv/lib/python3.9/site-packages/keras_nlp/src/utils/pipeline_model.py", line 188, in fit
    return super().fit(
  File "/root/classifier/.venv/lib/python3.9/site-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/root/classifier/.venv/lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 53, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.NotFoundError: Graph execution error:

Detected at node albert_preprocessor/albert_tokenizer/SentenceTokenizer/SentencepieceTokenizeOp defined at (most recent call last):
<stack traces unavailable>
Detected at node albert_preprocessor/albert_tokenizer/SentenceTokenizer/SentencepieceTokenizeOp defined at (most recent call last):
<stack traces unavailable>
2 root error(s) found.
  (0) NOT_FOUND:  Resource localhost/_0_SentencepieceOp/N10tensorflow4text12_GLOBAL__N_121SentencepieceResourceE does not exist.
         [[{{node albert_preprocessor/albert_tokenizer/SentenceTokenizer/SentencepieceTokenizeOp}}]]
         [[IteratorGetNext]]
         [[IteratorGetNext/_6]]
  (1) NOT_FOUND:  Resource localhost/_0_SentencepieceOp/N10tensorflow4text12_GLOBAL__N_121SentencepieceResourceE does not exist.
         [[{{node albert_preprocessor/albert_tokenizer/SentenceTokenizer/SentencepieceTokenizeOp}}]]
         [[IteratorGetNext]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_17437]

To Reproduce
Run the following code:

import keras_nlp
model = keras_nlp.models.AlbertClassifier.from_preset(
           'albert_extra_extra_large_en_uncased',
            preprocessor=keras_nlp.models.AlbertPreprocessor.from_preset(
            'albert_extra_extra_large_en_uncased',
            sequence_length=128,
        ),
            num_classes=4),
            load_weights=True,
            activation='softmax',
        )
model.backbone.trainable = False
model.fit(x=['foo bar'], y=[0])

Expected behavior
The Albert preprocessor is able to detect sentence piece and use it for its preprocessing tasks,

Additional context
Things I have tried to fix it but with no success :

  • Uninstall and reinstall sentencepiece
  • Downgrade tensorflow-text to 2.15.1
@mattdangerw
Copy link
Member

Thanks for filing! I suspect this might be related to the environment. Just trying your snippet on a colab I don't see this the issue.
https://colab.research.google.com/gist/mattdangerw/7639862e2d45ab55a3634c0d3f965265/try-snippet.ipynb

So some more information might be useful for diagnosing here.

How are you running this? How did you install deps? Os version? python version? tf verison? sentencepiece version?

Thanks!

@deathsaber
Copy link
Author

deathsaber commented Apr 19, 2024

Hi @mattdangerw ,

Apologies for a delayed response. Sure, I am happy to provide more information to help diagnose the issue. I noticed that the code was able to run successfully on your colab notebook and, your keras-nlp and tf versions matched mine. This led me to suspect if the CUDA or CuDNN versions could have something to do with this. I have provided the GPU details as well, in case it helps. Please do let me know if you need more information or if there are any steps you'd want me to try out.

How am I running this

I have tried it on two separate machines - on my desktop (WSL2 on Win 11) and on an AWS EC2 (RHEL 9).
After creating a python virtual environment and installing the dependencies in it, I just start a python script with the virtual environment activated. (Details shared below).

OS Versions -

Tried on a couple of OS's.

1. Kali Linux 2021.4 on WSL2 (Win 11)
2. RHEL 9 on an AWS EC2

Python Version -

Python 3.9 (same for both the OS's)

How the deps were installed

On a clean system, a python virtual env was created, which was then activated and the deps were installed in that virtual env via pip.
(.venv) xxx@xxx:~# pip install -U pip && pip install keras-nlp tensorflow[and-cuda] sentencepiece

Tensorflow and Sentencepiece version -

tensorflow==2.16.1
sentencepiece==0.2.0

GPU and CUDA Details

  1. On the WSL2 machine - NVidia RTX 3060Ti, CUDA 12-3, CuDNN 8907
  2. On the RHEL (EC2) machine - NVidia A10G, CUDA 12-3, CuDNN 8907

@deathsaber
Copy link
Author

deathsaber commented May 3, 2024

@mattdangerw - Hate to bother you but were you able to get a chance to take a look into this issue?
Not sure why but, apart from the BERT and ROBERTA classifiers, the other BERT-based models like DEBERTA and ALBERT keep failing for me with the same error, no matter where I try them (apart from colab).
Should I run a pip freeze and list the packages I got installed, in case that helps?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants