Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

specialized_infer.py returns 401 Client Error #9

Closed
yumemio opened this issue May 28, 2024 · 3 comments
Closed

specialized_infer.py returns 401 Client Error #9

yumemio opened this issue May 28, 2024 · 3 comments

Comments

@yumemio
Copy link

yumemio commented May 28, 2024

Hello! Kudos to you for making this repository. Also I want to say the paper was awesome too. Combining multiple domain expert models seems to be a promising approach, especially in low-resource settings where we can't run a huge general-purpose model!

I'm having some issue running end-to-end inference with specialized_infer.py (by "end-to-end inference" I mean calling the Octopus model, and then calling an expert model to get the final answer).

First I commented out some experts that do not exist yet:

from utils import functional_token_mapping, extract_content
from specialized_models_inference import (
    inference_biology,
    inference_business,
    inference_chemistry,
    inference_computer_science,
    inference_math,
    inference_physics,
    inference_electrical_engineering,
    inference_history,
    inference_philosophy,
    inference_law,
    #inference_politics,
    inference_culture,
    inference_economics,
    inference_geography,
    #inference_psychology,
    #inference_health,
    #inference_general,
)
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import time

torch.random.manual_seed(0)

model_import_mapping = {
    "physics_gpt": lambda: inference_physics.model(),
    "chemistry_gpt": lambda: inference_chemistry.model(),
    "biology_gpt": lambda: inference_biology.model(),
    "computer_science_gpt": lambda: inference_computer_science.model(),
    "math_gpt": lambda: inference_math.model(),
    "business_gpt": lambda: inference_business.model(),
    "electrical_engineering_gpt": lambda: inference_electrical_engineering.model(),
    "history_gpt": lambda: inference_history.model(),
    "philosophy_gpt": lambda: inference_philosophy.model(),
    "law_gpt": lambda: inference_law.model(),
    #"politics_gpt": lambda: inference_politics.model(),
    "culture_gpt": lambda: inference_culture.model(),
    "economics_gpt": lambda: inference_economics.model(),
    "geography_gpt": lambda: inference_geography.model(),
    #"psychology_gpt": lambda: inference_psychology.model(),
    #"health_gpt": lambda: inference_health.model(),
    #"general_gpt": lambda: inference_general.model(),
}

But then I got this error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status
    response.raise_for_status()
  File "/usr/local/lib/python3.10/dist-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/NexaAIDev/octopus-v4-finetuned-v1/resolve/main/tokenizer_config.json

...

Traceback (most recent call last):
  File "/content/octopus-v4/specialized_infer.py", line 108, in <module>
    tokenizer = AutoTokenizer.from_pretrained("NexaAIDev/octopus-v4-finetuned-v1")
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py", line 817, in from_pretrained
    tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py", line 649, in get_tokenizer_config
    resolved_config_file = cached_file(
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py", line 422, in cached_file
    raise EnvironmentError(
OSError: NexaAIDev/octopus-v4-finetuned-v1 is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo either by logging in with `huggingface-cli login` or by passing `token=<your_token>`

The error suggests that the code is trying to access a 🤗 model that's not released yet. Any plans on making the model public?

Thanks for looking into this!

@LyH88
Copy link
Collaborator

LyH88 commented Jun 3, 2024

Hi yumemio! The error is due to the model name that has been changed but not yet reflected in the code. For now, if you change the model name to "NexaAIDev/Octopus-v4," it should resolve the issue. Specifically, update the tokenizer initialization at line 108 to:
tokenizer = AutoTokenizer.from_pretrained("NexaAIDev/Octopus-v4")

@zhiyuan8
Copy link
Contributor

zhiyuan8 commented Jun 3, 2024

@yumemio Please try the updated code, the repo name for AutoTokenizer has been changed.

@yumemio
Copy link
Author

yumemio commented Jun 4, 2024

@LyH88 @zhiyuan8 Now it works like a charm. Thank you! 🤗

Complete log output

$ python specialized_infer.py
`flash-attention` package not found, consider installing for better performance: No module named 'flash_attn'.
Current `flash-attenton` does not support `window_size`. Either upgrade or use `attn_implementation='eager'`.
Loading checkpoint shards: 100% 2/2 [00:02<00:00,  1.45s/it]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

============= Below is Octopus-V4 response ==============

You are not running the flash-attention implementation, expect numerical differences.
2024-06-04 00:10:05.319401: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-06-04 00:10:05.372195: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-06-04 00:10:05.372249: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-06-04 00:10:05.374141: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-06-04 00:10:05.382628: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-04 00:10:06.476382: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
<nexa_4> ('Determine the derivative of the function f(x) = x^3 at the point where x equals 2, and interpret the result within the context of rate of change and tangent slope.')<nexa_end>
Elapsed time: 7.09s
Functional Token: <nexa_4>
Format Argument: Determine the derivative of the function f(x) = x^3 at the point where x equals 2, and interpret the result within the context of rate of change and tangent slope.

============= Below is specialized LLM response ==============

config.json: 100% 623/623 [00:00<00:00, 5.82MB/s]
pytorch_model.bin.index.json: 100% 23.9k/23.9k [00:00<00:00, 57.6MB/s]
Downloading shards:   0% 0/2 [00:00<?, ?it/s]
pytorch_model-00001-of-00002.bin:   0% 0.00/9.94G [00:00<?, ?B/s]
...
pytorch_model-00001-of-00002.bin: 100% 9.94G/9.94G [00:34<00:00, 290MB/s]
Downloading shards:  50% 1/2 [00:34<00:34, 34.50s/it]
pytorch_model-00002-of-00002.bin:   0% 0.00/4.54G [00:00<?, ?B/s]
...
pytorch_model-00002-of-00002.bin: 100% 4.54G/4.54G [00:15<00:00, 290MB/s]
Downloading shards: 100% 2/2 [00:50<00:00, 25.15s/it]
Loading checkpoint shards: 100% 2/2 [00:03<00:00,  1.98s/it]
generation_config.json: 100% 120/120 [00:00<00:00, 1.15MB/s]
WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu.
tokenizer_config.json: 100% 1.69k/1.69k [00:00<00:00, 16.3MB/s]
tokenizer.model: 100% 493k/493k [00:00<00:00, 198MB/s]
added_tokens.json: 100% 90.0/90.0 [00:00<00:00, 711kB/s]
special_tokens_map.json: 100% 101/101 [00:00<00:00, 948kB/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.


To find the derivative of the function f(x) = x^3 at the point where x equals 2, we will use the power rule of differentiation. The power rule states that if a function is in the form f(x) = x^n, then the derivative of the function is f'(x) = n * x^(n-1).

In this case, n = 3, so the derivative of f(x) = x^3 is f'(x) = 3 * x^2.

Now, we need to evaluate the derivative at x = 2:

f'(2) = 3 * (2)^2 = 3 * 4 = 12

So, the derivative of f(x) = x^3 at the point where x equals 2 is f'(2) = 12.

Interpreting the result within the context of rate of change and tangent slope:

The derivative of a function represents the rate of change of the function with respect to the independent variable. In this case, the rate of change of f(x) = x^3 with respect to x at x = 2 is 12.

The tangent slope at the point (2, f(2)) is also equal to the derivative f'(2) = 12. This means that the tangent line to the curve y = x^3 at the point (2, 8) has a slope of 12.

In conclusion, the derivative of f(x) = x^3 at the point where x equals 2 is f'(2) = 12, which represents the rate of change of the function and the slope of the tangent line at that point.

Closing the issue as resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants