-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
specialized_infer.py
returns 401 Client Error
#9
Comments
Hi yumemio! The error is due to the model name that has been changed but not yet reflected in the code. For now, if you change the model name to "NexaAIDev/Octopus-v4," it should resolve the issue. Specifically, update the tokenizer initialization at line 108 to: |
@yumemio Please try the updated code, the repo name for |
@LyH88 @zhiyuan8 Now it works like a charm. Thank you! 🤗 Complete log output
$ python specialized_infer.py
`flash-attention` package not found, consider installing for better performance: No module named 'flash_attn'.
Current `flash-attenton` does not support `window_size`. Either upgrade or use `attn_implementation='eager'`.
Loading checkpoint shards: 100% 2/2 [00:02<00:00, 1.45s/it]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
============= Below is Octopus-V4 response ==============
You are not running the flash-attention implementation, expect numerical differences.
2024-06-04 00:10:05.319401: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-06-04 00:10:05.372195: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-06-04 00:10:05.372249: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-06-04 00:10:05.374141: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-06-04 00:10:05.382628: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-04 00:10:06.476382: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
<nexa_4> ('Determine the derivative of the function f(x) = x^3 at the point where x equals 2, and interpret the result within the context of rate of change and tangent slope.')<nexa_end>
Elapsed time: 7.09s
Functional Token: <nexa_4>
Format Argument: Determine the derivative of the function f(x) = x^3 at the point where x equals 2, and interpret the result within the context of rate of change and tangent slope.
============= Below is specialized LLM response ==============
config.json: 100% 623/623 [00:00<00:00, 5.82MB/s]
pytorch_model.bin.index.json: 100% 23.9k/23.9k [00:00<00:00, 57.6MB/s]
Downloading shards: 0% 0/2 [00:00<?, ?it/s]
pytorch_model-00001-of-00002.bin: 0% 0.00/9.94G [00:00<?, ?B/s]
...
pytorch_model-00001-of-00002.bin: 100% 9.94G/9.94G [00:34<00:00, 290MB/s]
Downloading shards: 50% 1/2 [00:34<00:34, 34.50s/it]
pytorch_model-00002-of-00002.bin: 0% 0.00/4.54G [00:00<?, ?B/s]
...
pytorch_model-00002-of-00002.bin: 100% 4.54G/4.54G [00:15<00:00, 290MB/s]
Downloading shards: 100% 2/2 [00:50<00:00, 25.15s/it]
Loading checkpoint shards: 100% 2/2 [00:03<00:00, 1.98s/it]
generation_config.json: 100% 120/120 [00:00<00:00, 1.15MB/s]
WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu.
tokenizer_config.json: 100% 1.69k/1.69k [00:00<00:00, 16.3MB/s]
tokenizer.model: 100% 493k/493k [00:00<00:00, 198MB/s]
added_tokens.json: 100% 90.0/90.0 [00:00<00:00, 711kB/s]
special_tokens_map.json: 100% 101/101 [00:00<00:00, 948kB/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.
To find the derivative of the function f(x) = x^3 at the point where x equals 2, we will use the power rule of differentiation. The power rule states that if a function is in the form f(x) = x^n, then the derivative of the function is f'(x) = n * x^(n-1).
In this case, n = 3, so the derivative of f(x) = x^3 is f'(x) = 3 * x^2.
Now, we need to evaluate the derivative at x = 2:
f'(2) = 3 * (2)^2 = 3 * 4 = 12
So, the derivative of f(x) = x^3 at the point where x equals 2 is f'(2) = 12.
Interpreting the result within the context of rate of change and tangent slope:
The derivative of a function represents the rate of change of the function with respect to the independent variable. In this case, the rate of change of f(x) = x^3 with respect to x at x = 2 is 12.
The tangent slope at the point (2, f(2)) is also equal to the derivative f'(2) = 12. This means that the tangent line to the curve y = x^3 at the point (2, 8) has a slope of 12.
In conclusion, the derivative of f(x) = x^3 at the point where x equals 2 is f'(2) = 12, which represents the rate of change of the function and the slope of the tangent line at that point. Closing the issue as resolved. |
Hello! Kudos to you for making this repository. Also I want to say the paper was awesome too. Combining multiple domain expert models seems to be a promising approach, especially in low-resource settings where we can't run a huge general-purpose model!
I'm having some issue running end-to-end inference with
specialized_infer.py
(by "end-to-end inference" I mean calling the Octopus model, and then calling an expert model to get the final answer).First I commented out some experts that do not exist yet:
But then I got this error:
The error suggests that the code is trying to access a 🤗 model that's not released yet. Any plans on making the model public?
Thanks for looking into this!
The text was updated successfully, but these errors were encountered: