Issue when fine-tuning Albert - Resource localhost/_0_SentencepieceOp/N10tensorflow4text12_GLOBAL__N_121SentencepieceResourceE does not exist. #1573

deathsaber · 2024-04-10T23:18:58Z

Describe the bug
I am fine-tuning the Keras implementation of Albert for my dataset for a classification problem by following the documentations present here - https://keras.io/api/keras_nlp/models/albert/albert_classifier/

The gist of how I am creating the model and fitting it is given below:

model = keras_nlp.models.AlbertClassifier.from_preset(
           'albert_extra_extra_large_en_uncased',
            preprocessor=keras_nlp.models.AlbertPreprocessor.from_preset(
            'albert_extra_extra_large_en_uncased',
            sequence_length=128,
        ),
            num_classes=4,
            load_weights=True,
            activation='softmax',
        )
model.backbone.trainable = False
model.fit(x=train_x, y=train_y)

When the fitting process runs, it always errors out with the stacktrace shown below. I am suspecting it must be related to SentencePiece not getting detected somehow based on the error message.

What might be doing wrong here? Am I missing something obvious?

2024-04-11 04:32:58.042517: I tensorflow/core/framework/local_rendezvous.cc:421] Local rendezvous recv item cancelled. Key hash: 5086726160540042098
Traceback (most recent call last):
  File "/mnt/d/bot_projects/incident-classification/train.py", line 27, in <module>
    albert.AlBERTClassifier().train()
  File "/mnt/d/bot_projects/incident-classification/models/albert.py", line 67, in train
    stats = self.model.fit(x=x.tolist(), y=y.tolist(), validation_data=(t_x.tolist(), t_y.tolist()), batch_size=self.conf['modelParams']['albert']['batchSize'],
  File "/root/classifier/.venv/lib/python3.9/site-packages/keras_nlp/src/utils/pipeline_model.py", line 188, in fit
    return super().fit(
  File "/root/classifier/.venv/lib/python3.9/site-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/root/classifier/.venv/lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 53, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.NotFoundError: Graph execution error:

Detected at node albert_preprocessor/albert_tokenizer/SentenceTokenizer/SentencepieceTokenizeOp defined at (most recent call last):
<stack traces unavailable>
Detected at node albert_preprocessor/albert_tokenizer/SentenceTokenizer/SentencepieceTokenizeOp defined at (most recent call last):
<stack traces unavailable>
2 root error(s) found.
  (0) NOT_FOUND:  Resource localhost/_0_SentencepieceOp/N10tensorflow4text12_GLOBAL__N_121SentencepieceResourceE does not exist.
         [[{{node albert_preprocessor/albert_tokenizer/SentenceTokenizer/SentencepieceTokenizeOp}}]]
         [[IteratorGetNext]]
         [[IteratorGetNext/_6]]
  (1) NOT_FOUND:  Resource localhost/_0_SentencepieceOp/N10tensorflow4text12_GLOBAL__N_121SentencepieceResourceE does not exist.
         [[{{node albert_preprocessor/albert_tokenizer/SentenceTokenizer/SentencepieceTokenizeOp}}]]
         [[IteratorGetNext]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_17437]

To Reproduce
Run the following code:

import keras_nlp
model = keras_nlp.models.AlbertClassifier.from_preset(
           'albert_extra_extra_large_en_uncased',
            preprocessor=keras_nlp.models.AlbertPreprocessor.from_preset(
            'albert_extra_extra_large_en_uncased',
            sequence_length=128,
        ),
            num_classes=4),
            load_weights=True,
            activation='softmax',
        )
model.backbone.trainable = False
model.fit(x=['foo bar'], y=[0])

Expected behavior
The Albert preprocessor is able to detect sentence piece and use it for its preprocessing tasks,

Additional context
Things I have tried to fix it but with no success :

Uninstall and reinstall sentencepiece
Downgrade tensorflow-text to 2.15.1

The text was updated successfully, but these errors were encountered:

mattdangerw · 2024-04-11T23:05:39Z

Thanks for filing! I suspect this might be related to the environment. Just trying your snippet on a colab I don't see this the issue.
https://colab.research.google.com/gist/mattdangerw/7639862e2d45ab55a3634c0d3f965265/try-snippet.ipynb

So some more information might be useful for diagnosing here.

How are you running this? How did you install deps? Os version? python version? tf verison? sentencepiece version?

Thanks!

deathsaber · 2024-04-19T16:18:46Z

Hi @mattdangerw ,

Apologies for a delayed response. Sure, I am happy to provide more information to help diagnose the issue. I noticed that the code was able to run successfully on your colab notebook and, your keras-nlp and tf versions matched mine. This led me to suspect if the CUDA or CuDNN versions could have something to do with this. I have provided the GPU details as well, in case it helps. Please do let me know if you need more information or if there are any steps you'd want me to try out.

How am I running this

I have tried it on two separate machines - on my desktop (WSL2 on Win 11) and on an AWS EC2 (RHEL 9).
After creating a python virtual environment and installing the dependencies in it, I just start a python script with the virtual environment activated. (Details shared below).

OS Versions -

Tried on a couple of OS's.

1. Kali Linux 2021.4 on WSL2 (Win 11)
2. RHEL 9 on an AWS EC2

Python Version -

Python 3.9 (same for both the OS's)

How the deps were installed

On a clean system, a python virtual env was created, which was then activated and the deps were installed in that virtual env via pip.
(.venv) xxx@xxx:~# pip install -U pip && pip install keras-nlp tensorflow[and-cuda] sentencepiece

Tensorflow and Sentencepiece version -

tensorflow==2.16.1
sentencepiece==0.2.0

GPU and CUDA Details

On the WSL2 machine - NVidia RTX 3060Ti, CUDA 12-3, CuDNN 8907
On the RHEL (EC2) machine - NVidia A10G, CUDA 12-3, CuDNN 8907

deathsaber · 2024-05-03T22:47:32Z

@mattdangerw - Hate to bother you but were you able to get a chance to take a look into this issue?
Not sure why but, apart from the BERT and ROBERTA classifiers, the other BERT-based models like DEBERTA and ALBERT keep failing for me with the same error, no matter where I try them (apart from colab).
Should I run a pip freeze and list the packages I got installed, in case that helps?

github-actions bot assigned sachinprasadhs Apr 10, 2024

sachinprasadhs added type:Bug Something isn't working stat:awaiting response from contributor labels Apr 11, 2024

sachinprasadhs removed the stat:awaiting response from contributor label Apr 19, 2024

sachinprasadhs added the stat:awaiting keras-eng label May 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue when fine-tuning Albert - Resource localhost/_0_SentencepieceOp/N10tensorflow4text12_GLOBAL__N_121SentencepieceResourceE does not exist. #1573

Issue when fine-tuning Albert - Resource localhost/_0_SentencepieceOp/N10tensorflow4text12_GLOBAL__N_121SentencepieceResourceE does not exist. #1573

deathsaber commented Apr 10, 2024 •

edited

mattdangerw commented Apr 11, 2024

deathsaber commented Apr 19, 2024 •

edited

deathsaber commented May 3, 2024 •

edited

Issue when fine-tuning Albert - Resource localhost/_0_SentencepieceOp/N10tensorflow4text12_GLOBAL__N_121SentencepieceResourceE does not exist. #1573

Issue when fine-tuning Albert - Resource localhost/_0_SentencepieceOp/N10tensorflow4text12_GLOBAL__N_121SentencepieceResourceE does not exist. #1573

Comments

deathsaber commented Apr 10, 2024 • edited

mattdangerw commented Apr 11, 2024

deathsaber commented Apr 19, 2024 • edited

How am I running this

OS Versions -

Python Version -

How the deps were installed

Tensorflow and Sentencepiece version -

GPU and CUDA Details

deathsaber commented May 3, 2024 • edited

deathsaber commented Apr 10, 2024 •

edited

deathsaber commented Apr 19, 2024 •

edited

deathsaber commented May 3, 2024 •

edited