You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the fitting process runs, it always errors out with the stacktrace shown below. I am suspecting it must be related to SentencePiece not getting detected somehow based on the error message.
What might be doing wrong here? Am I missing something obvious?
2024-04-11 04:32:58.042517: I tensorflow/core/framework/local_rendezvous.cc:421] Local rendezvous recv item cancelled. Key hash: 5086726160540042098
Traceback (most recent call last):
File "/mnt/d/bot_projects/incident-classification/train.py", line 27, in <module>
albert.AlBERTClassifier().train()
File "/mnt/d/bot_projects/incident-classification/models/albert.py", line 67, in train
stats = self.model.fit(x=x.tolist(), y=y.tolist(), validation_data=(t_x.tolist(), t_y.tolist()), batch_size=self.conf['modelParams']['albert']['batchSize'],
File "/root/classifier/.venv/lib/python3.9/site-packages/keras_nlp/src/utils/pipeline_model.py", line 188, in fit
return super().fit(
File "/root/classifier/.venv/lib/python3.9/site-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/root/classifier/.venv/lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 53, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.NotFoundError: Graph execution error:
Detected at node albert_preprocessor/albert_tokenizer/SentenceTokenizer/SentencepieceTokenizeOp defined at (most recent call last):
<stack traces unavailable>
Detected at node albert_preprocessor/albert_tokenizer/SentenceTokenizer/SentencepieceTokenizeOp defined at (most recent call last):
<stack traces unavailable>
2 root error(s) found.
(0) NOT_FOUND: Resource localhost/_0_SentencepieceOp/N10tensorflow4text12_GLOBAL__N_121SentencepieceResourceE does not exist.
[[{{node albert_preprocessor/albert_tokenizer/SentenceTokenizer/SentencepieceTokenizeOp}}]]
[[IteratorGetNext]]
[[IteratorGetNext/_6]]
(1) NOT_FOUND: Resource localhost/_0_SentencepieceOp/N10tensorflow4text12_GLOBAL__N_121SentencepieceResourceE does not exist.
[[{{node albert_preprocessor/albert_tokenizer/SentenceTokenizer/SentencepieceTokenizeOp}}]]
[[IteratorGetNext]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_17437]
Apologies for a delayed response. Sure, I am happy to provide more information to help diagnose the issue. I noticed that the code was able to run successfully on your colab notebook and, your keras-nlp and tf versions matched mine. This led me to suspect if the CUDA or CuDNN versions could have something to do with this. I have provided the GPU details as well, in case it helps. Please do let me know if you need more information or if there are any steps you'd want me to try out.
How am I running this
I have tried it on two separate machines - on my desktop (WSL2 on Win 11) and on an AWS EC2 (RHEL 9).
After creating a python virtual environment and installing the dependencies in it, I just start a python script with the virtual environment activated. (Details shared below).
OS Versions -
Tried on a couple of OS's.
1. Kali Linux 2021.4 on WSL2 (Win 11)
2. RHEL 9 on an AWS EC2
Python Version -
Python 3.9 (same for both the OS's)
How the deps were installed
On a clean system, a python virtual env was created, which was then activated and the deps were installed in that virtual env via pip. (.venv) xxx@xxx:~# pip install -U pip && pip install keras-nlp tensorflow[and-cuda] sentencepiece
Tensorflow and Sentencepiece version -
tensorflow==2.16.1
sentencepiece==0.2.0
GPU and CUDA Details
On the WSL2 machine - NVidia RTX 3060Ti, CUDA 12-3, CuDNN 8907
On the RHEL (EC2) machine - NVidia A10G, CUDA 12-3, CuDNN 8907
@mattdangerw - Hate to bother you but were you able to get a chance to take a look into this issue?
Not sure why but, apart from the BERT and ROBERTA classifiers, the other BERT-based models like DEBERTA and ALBERT keep failing for me with the same error, no matter where I try them (apart from colab).
Should I run a pip freeze and list the packages I got installed, in case that helps?
Describe the bug
I am fine-tuning the Keras implementation of Albert for my dataset for a classification problem by following the documentations present here - https://keras.io/api/keras_nlp/models/albert/albert_classifier/
The gist of how I am creating the model and fitting it is given below:
When the fitting process runs, it always errors out with the stacktrace shown below. I am suspecting it must be related to SentencePiece not getting detected somehow based on the error message.
What might be doing wrong here? Am I missing something obvious?
To Reproduce
Run the following code:
Expected behavior
The Albert preprocessor is able to detect sentence piece and use it for its preprocessing tasks,
Additional context
Things I have tried to fix it but with no success :
The text was updated successfully, but these errors were encountered: