Error while running tutorial_pretrained_bert.ipynb on Ubuntu 22.04.1 LTS #602

IsaacRodgzb · 2022-11-22T19:24:03Z

Hello,

I'm trying to follow the tutorial notebook (compiling a HF model) on a new inf1.xlarge instance with Ubuntu 22.04.1 (jammy codename) installed. But when running the part of:

num_cores = 4 # This value should be 4 on inf1.xlarge and inf1.2xlarge
os.environ['NEURON_RT_NUM_CORES'] = str(num_cores)

# Build tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("bert-base-cased-finetuned-mrpc")
model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased-finetuned-mrpc", return_dict=False)

# Setup some example inputs
sequence_0 = "The company HuggingFace is based in New York City"
sequence_1 = "Apples are especially bad for your health"
sequence_2 = "HuggingFace's headquarters are situated in Manhattan"

max_length=128
paraphrase = tokenizer.encode_plus(sequence_0, sequence_2, max_length=max_length, padding='max_length', truncation=True, return_tensors="pt")
not_paraphrase = tokenizer.encode_plus(sequence_0, sequence_1, max_length=max_length, padding='max_length', truncation=True, return_tensors="pt")

# Run the original PyTorch model on compilation exaple
paraphrase_classification_logits = model(**paraphrase)[0]

# Convert example inputs to a format that is compatible with TorchScript tracing
example_inputs_paraphrase = paraphrase['input_ids'], paraphrase['attention_mask'], paraphrase['token_type_ids']
example_inputs_not_paraphrase = not_paraphrase['input_ids'], not_paraphrase['attention_mask'], not_paraphrase['token_type_ids']

model_neuron = torch.neuron.trace(
    model,
    example_inputs_paraphrase,
    strict=False,
    compiler_args=["--neuroncore-pipeline-cores", str(num_cores)]
)

I get this output in console:

INFO:Neuron:There are 3 ops of 1 different types in the TorchScript that are not compiled by neuron-cc: aten::embedding, (For more information see https://github.com/aws/aws-neuron-sdk/blob/master/release-notes/neuron-cc-ops/neuron-cc-ops-pytorch.md)\
INFO:Neuron:Number of arithmetic operators (pre-compilation) before = 565, fused = 548, percent fused = 96.99%
2022-11-22 18:43:24.392722: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-11-22 18:43:24.528353: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-11-22 18:43:24.528380: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-11-22 18:43:25.103517: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2022-11-22 18:43:25.103588: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2022-11-22 18:43:25.103602: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
INFO:Neuron:Number of neuron graph operations 1601 did not match traced graph 1323 - using heuristic matching of hierarchical information
WARNING:Neuron:torch.neuron.trace failed on _NeuronGraph$698; falling back to native python function call
ERROR:Neuron:Please check that neuron-cc is installed and working properly. You can install neuron-cc using 'python3 -m pip install neuron-cc[tensorflow] -U --extra-index-url=https://pip.repos.neuron.amazonaws.com'\
Traceback (most recent call last):
File "/opt/torchserve/torchserve_env/lib/python3.9/site-packages/torch_neuron/convert.py", line 381, in op_converter
neuron_function = self.subgraph_compiler(
File "/opt/torchserve/torchserve_env/lib/python3.9/site-packages/torch_neuron/decorators.py", line 134, in trace
raise RuntimeError(
RuntimeError: Please check that neuron-cc is installed and working properly. You can install neuron-cc using 'python3 -m pip install neuron-cc[tensorflow] -U --extra-index-url=https://pip.repos.neuron.amazonaws.com'\
INFO:Neuron:Number of arithmetic operators (post-compilation) before = 565, compiled = 0, percent compiled = 0.0%
INFO:Neuron:The neuron partitioner created 1 sub-graphs
INFO:Neuron:Neuron successfully compiled 0 sub-graphs, Total fused subgraphs = 1, Percent of model sub-graphs successfully compiled = 0.0%
INFO:Neuron:Compiled these operators (and operator counts) to Neuron:
INFO:Neuron:Not compiled operators (and operator counts) to Neuron:
INFO:Neuron: => aten::Int: 97 [supported]
INFO:Neuron: => aten::add: 39 [supported]
INFO:Neuron: => aten::contiguous: 12 [supported]
INFO:Neuron: => aten::div: 12 [supported]
INFO:Neuron: => aten::dropout: 38 [supported]
INFO:Neuron: => aten::embedding: 3 [not supported]
INFO:Neuron: => aten::gelu: 12 [supported]
INFO:Neuron: => aten::layer_norm: 25 [supported]
INFO:Neuron: => aten::linear: 74 [supported]
INFO:Neuron: => aten::matmul: 24 [supported]
INFO:Neuron: => aten::mul: 1 [supported]
INFO:Neuron: => aten::permute: 48 [supported]
INFO:Neuron: => aten::rsub: 1 [supported]
INFO:Neuron: => aten::select: 1 [supported]
INFO:Neuron: => aten::size: 97 [supported]
INFO:Neuron: => aten::slice: 5 [supported]
INFO:Neuron: => aten::softmax: 12 [supported]
INFO:Neuron: => aten::tanh: 1 [supported]
INFO:Neuron: => aten::to: 1 [supported]
INFO:Neuron: => aten::transpose: 12 [supported]
INFO:Neuron: => aten::unsqueeze: 2 [supported]
INFO:Neuron: => aten::view: 48 [supported]
Traceback (most recent call last):
File "/opt/torchserve/neuron_compile.py", line 31, in
model_neuron = torch.neuron.trace(
File "/opt/torchserve/torchserve_env/lib/python3.9/site-packages/torch_neuron/convert.py", line 184, in trace
cu.stats_post_compiler(neuron_graph)
File "/opt/torchserve/torchserve_env/lib/python3.9/site-packages/torch_neuron/convert.py", line 492, in stats_post_compiler
raise RuntimeError(
RuntimeError: No operations were successfully partitioned and compiled to neuron for this model - aborting trace!}

I had to configure an older repository (focal) in order to install the neuron drivers. Because using 'jammy' would throw an error. I used this:

sudo tee /etc/apt/sources.list.d/neuron.list > /dev/null <<EOF
deb [arch=amd64 signed-by=/usr/share/keyrings/neuron.gpg] https://apt.repos.neuron.amazonaws.com focal main
EOF

By changing this, I was able to install the required drivers and tools with:

sudo apt-get install aws-neuronx-dkms -y
sudo apt-get install aws-neuron-tools -y

I also installed the necessary python packages (using python 3.9). These are all the packages I ended up with:

absl-py==1.3.0
astunparse==1.6.3
cachetools==5.2.0
certifi==2022.9.24
charset-normalizer==2.1.1
filelock==3.8.0
flatbuffers==22.10.26
gast==0.4.0
google-auth==2.14.1
google-auth-oauthlib==0.4.6
google-pasta==0.2.0
grpcio==1.50.0
h5py==3.7.0
huggingface-hub==0.11.0
idna==3.4
importlib-metadata==5.0.0
joblib==1.2.0
keras==2.11.0
libclang==14.0.6
Markdown==3.4.1
MarkupSafe==2.1.1
neuron-cc==1.0.post1
numpy==1.23.5
oauthlib==3.2.2
opt-einsum==3.3.0
packaging==21.3
Pillow==9.3.0
protobuf==3.19.6
pyasn1==0.4.8
pyasn1-modules==0.2.8
pyparsing==3.0.9
PyYAML==6.0
regex==2022.10.31
requests==2.28.1
requests-oauthlib==1.3.1
rsa==4.9
six==1.16.0
tensorboard==2.11.0
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
tensorflow==2.11.0
tensorflow-estimator==2.11.0
tensorflow-io-gcs-filesystem==0.28.0
termcolor==2.1.1
tokenizers==0.12.1
torch==1.11.0
torch-neuron==1.11.0.2.3.0.0
torchvision==0.12.0
tqdm==4.64.1
transformers==4.19.4
typing_extensions==4.4.0
urllib3==1.26.12
Werkzeug==2.2.2
wrapt==1.14.1
zipp==3.10.0

Could it be the Ubuntu version? I know the instructions cover up to Ubuntu 20, but not sure if that might be the issue. Thanks!

The text was updated successfully, but these errors were encountered:

mrnikwaws · 2022-11-22T19:27:39Z

Hi @IsaacRodgzb,

Python 3.9 is not currently supported. Please try one of the supported Python versions documented here: https://awsdocs-neuron.readthedocs-hosted.com/en/v1.16.1/release-notes/releasecontent.html#dependency-software-supported-versions

IsaacRodgzb · 2022-11-22T19:34:55Z

Oh hadn't noticed that. Thanks @mrnikwaws ! will try with 3.8.

IsaacRodgzb · 2022-11-22T19:41:06Z

Should I still follow this?

pip install "transformers<4.20.0"

Given this issue?

IsaacRodgzb · 2022-11-22T20:02:01Z

@mrnikwaws FYI tried with python 3.8 and still not working. Downgraded to python3.7 and that worked.

IsaacRodgzb closed this as completed Nov 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error while running tutorial_pretrained_bert.ipynb on Ubuntu 22.04.1 LTS #602

Error while running tutorial_pretrained_bert.ipynb on Ubuntu 22.04.1 LTS #602

IsaacRodgzb commented Nov 22, 2022 •

edited

mrnikwaws commented Nov 22, 2022

IsaacRodgzb commented Nov 22, 2022

IsaacRodgzb commented Nov 22, 2022

IsaacRodgzb commented Nov 22, 2022

Error while running tutorial_pretrained_bert.ipynb on Ubuntu 22.04.1 LTS #602

Error while running tutorial_pretrained_bert.ipynb on Ubuntu 22.04.1 LTS #602

Comments

IsaacRodgzb commented Nov 22, 2022 • edited

mrnikwaws commented Nov 22, 2022

IsaacRodgzb commented Nov 22, 2022

IsaacRodgzb commented Nov 22, 2022

IsaacRodgzb commented Nov 22, 2022

IsaacRodgzb commented Nov 22, 2022 •

edited