Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while running tutorial_pretrained_bert.ipynb on Ubuntu 22.04.1 LTS #602

Closed
IsaacRodgzb opened this issue Nov 22, 2022 · 4 comments
Closed

Comments

@IsaacRodgzb
Copy link

IsaacRodgzb commented Nov 22, 2022

Hello,

I'm trying to follow the tutorial notebook (compiling a HF model) on a new inf1.xlarge instance with Ubuntu 22.04.1 (jammy codename) installed. But when running the part of:

num_cores = 4 # This value should be 4 on inf1.xlarge and inf1.2xlarge
os.environ['NEURON_RT_NUM_CORES'] = str(num_cores)

# Build tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("bert-base-cased-finetuned-mrpc")
model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased-finetuned-mrpc", return_dict=False)

# Setup some example inputs
sequence_0 = "The company HuggingFace is based in New York City"
sequence_1 = "Apples are especially bad for your health"
sequence_2 = "HuggingFace's headquarters are situated in Manhattan"

max_length=128
paraphrase = tokenizer.encode_plus(sequence_0, sequence_2, max_length=max_length, padding='max_length', truncation=True, return_tensors="pt")
not_paraphrase = tokenizer.encode_plus(sequence_0, sequence_1, max_length=max_length, padding='max_length', truncation=True, return_tensors="pt")

# Run the original PyTorch model on compilation exaple
paraphrase_classification_logits = model(**paraphrase)[0]

# Convert example inputs to a format that is compatible with TorchScript tracing
example_inputs_paraphrase = paraphrase['input_ids'], paraphrase['attention_mask'], paraphrase['token_type_ids']
example_inputs_not_paraphrase = not_paraphrase['input_ids'], not_paraphrase['attention_mask'], not_paraphrase['token_type_ids']

model_neuron = torch.neuron.trace(
    model,
    example_inputs_paraphrase,
    strict=False,
    compiler_args=["--neuroncore-pipeline-cores", str(num_cores)]
)

I get this output in console:

INFO:Neuron:There are 3 ops of 1 different types in the TorchScript that are not compiled by neuron-cc: aten::embedding, (For more information see https://github.com/aws/aws-neuron-sdk/blob/master/release-notes/neuron-cc-ops/neuron-cc-ops-pytorch.md)\
INFO:Neuron:Number of arithmetic operators (pre-compilation) before = 565, fused = 548, percent fused = 96.99%
2022-11-22 18:43:24.392722: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-11-22 18:43:24.528353: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-11-22 18:43:24.528380: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-11-22 18:43:25.103517: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2022-11-22 18:43:25.103588: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2022-11-22 18:43:25.103602: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
INFO:Neuron:Number of neuron graph operations 1601 did not match traced graph 1323 - using heuristic matching of hierarchical information
WARNING:Neuron:torch.neuron.trace failed on _NeuronGraph$698; falling back to native python function call
ERROR:Neuron:Please check that neuron-cc is installed and working properly. You can install neuron-cc using 'python3 -m pip install neuron-cc[tensorflow] -U --extra-index-url=https://pip.repos.neuron.amazonaws.com'\
Traceback (most recent call last):
File "/opt/torchserve/torchserve_env/lib/python3.9/site-packages/torch_neuron/convert.py", line 381, in op_converter
neuron_function = self.subgraph_compiler(
File "/opt/torchserve/torchserve_env/lib/python3.9/site-packages/torch_neuron/decorators.py", line 134, in trace
raise RuntimeError(
RuntimeError: Please check that neuron-cc is installed and working properly. You can install neuron-cc using 'python3 -m pip install neuron-cc[tensorflow] -U --extra-index-url=https://pip.repos.neuron.amazonaws.com'\
INFO:Neuron:Number of arithmetic operators (post-compilation) before = 565, compiled = 0, percent compiled = 0.0%
INFO:Neuron:The neuron partitioner created 1 sub-graphs
INFO:Neuron:Neuron successfully compiled 0 sub-graphs, Total fused subgraphs = 1, Percent of model sub-graphs successfully compiled = 0.0%
INFO:Neuron:Compiled these operators (and operator counts) to Neuron:
INFO:Neuron:Not compiled operators (and operator counts) to Neuron:
INFO:Neuron: => aten::Int: 97 [supported]
INFO:Neuron: => aten::add: 39 [supported]
INFO:Neuron: => aten::contiguous: 12 [supported]
INFO:Neuron: => aten::div: 12 [supported]
INFO:Neuron: => aten::dropout: 38 [supported]
INFO:Neuron: => aten::embedding: 3 [not supported]
INFO:Neuron: => aten::gelu: 12 [supported]
INFO:Neuron: => aten::layer_norm: 25 [supported]
INFO:Neuron: => aten::linear: 74 [supported]
INFO:Neuron: => aten::matmul: 24 [supported]
INFO:Neuron: => aten::mul: 1 [supported]
INFO:Neuron: => aten::permute: 48 [supported]
INFO:Neuron: => aten::rsub: 1 [supported]
INFO:Neuron: => aten::select: 1 [supported]
INFO:Neuron: => aten::size: 97 [supported]
INFO:Neuron: => aten::slice: 5 [supported]
INFO:Neuron: => aten::softmax: 12 [supported]
INFO:Neuron: => aten::tanh: 1 [supported]
INFO:Neuron: => aten::to: 1 [supported]
INFO:Neuron: => aten::transpose: 12 [supported]
INFO:Neuron: => aten::unsqueeze: 2 [supported]
INFO:Neuron: => aten::view: 48 [supported]
Traceback (most recent call last):
File "/opt/torchserve/neuron_compile.py", line 31, in
model_neuron = torch.neuron.trace(
File "/opt/torchserve/torchserve_env/lib/python3.9/site-packages/torch_neuron/convert.py", line 184, in trace
cu.stats_post_compiler(neuron_graph)
File "/opt/torchserve/torchserve_env/lib/python3.9/site-packages/torch_neuron/convert.py", line 492, in stats_post_compiler
raise RuntimeError(
RuntimeError: No operations were successfully partitioned and compiled to neuron for this model - aborting trace!}

I had to configure an older repository (focal) in order to install the neuron drivers. Because using 'jammy' would throw an error. I used this:

sudo tee /etc/apt/sources.list.d/neuron.list > /dev/null <<EOF
deb [arch=amd64 signed-by=/usr/share/keyrings/neuron.gpg] https://apt.repos.neuron.amazonaws.com focal main
EOF

By changing this, I was able to install the required drivers and tools with:

sudo apt-get install aws-neuronx-dkms -y
sudo apt-get install aws-neuron-tools -y

I also installed the necessary python packages (using python 3.9). These are all the packages I ended up with:

absl-py==1.3.0
astunparse==1.6.3
cachetools==5.2.0
certifi==2022.9.24
charset-normalizer==2.1.1
filelock==3.8.0
flatbuffers==22.10.26
gast==0.4.0
google-auth==2.14.1
google-auth-oauthlib==0.4.6
google-pasta==0.2.0
grpcio==1.50.0
h5py==3.7.0
huggingface-hub==0.11.0
idna==3.4
importlib-metadata==5.0.0
joblib==1.2.0
keras==2.11.0
libclang==14.0.6
Markdown==3.4.1
MarkupSafe==2.1.1
neuron-cc==1.0.post1
numpy==1.23.5
oauthlib==3.2.2
opt-einsum==3.3.0
packaging==21.3
Pillow==9.3.0
protobuf==3.19.6
pyasn1==0.4.8
pyasn1-modules==0.2.8
pyparsing==3.0.9
PyYAML==6.0
regex==2022.10.31
requests==2.28.1
requests-oauthlib==1.3.1
rsa==4.9
six==1.16.0
tensorboard==2.11.0
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
tensorflow==2.11.0
tensorflow-estimator==2.11.0
tensorflow-io-gcs-filesystem==0.28.0
termcolor==2.1.1
tokenizers==0.12.1
torch==1.11.0
torch-neuron==1.11.0.2.3.0.0
torchvision==0.12.0
tqdm==4.64.1
transformers==4.19.4
typing_extensions==4.4.0
urllib3==1.26.12
Werkzeug==2.2.2
wrapt==1.14.1
zipp==3.10.0

Could it be the Ubuntu version? I know the instructions cover up to Ubuntu 20, but not sure if that might be the issue. Thanks!

@mrnikwaws
Copy link
Contributor

Hi @IsaacRodgzb,

Python 3.9 is not currently supported. Please try one of the supported Python versions documented here: https://awsdocs-neuron.readthedocs-hosted.com/en/v1.16.1/release-notes/releasecontent.html#dependency-software-supported-versions

@IsaacRodgzb
Copy link
Author

Oh hadn't noticed that. Thanks @mrnikwaws ! will try with 3.8.

@IsaacRodgzb
Copy link
Author

Should I still follow this?

pip install "transformers<4.20.0"

Given this issue?

@IsaacRodgzb
Copy link
Author

@mrnikwaws FYI tried with python 3.8 and still not working. Downgraded to python3.7 and that worked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants