Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

text generation details not working when stream=False #1876

Open
2 of 4 tasks
uyeongkim opened this issue May 10, 2024 · 1 comment
Open
2 of 4 tasks

text generation details not working when stream=False #1876

uyeongkim opened this issue May 10, 2024 · 1 comment

Comments

@uyeongkim
Copy link

System Info

I ran docker with model-id with downloaded lamma3 model, from huggingface.
And I requested with python code below

from huggingface_hub import AsyncInferenceClient

client = AsyncInferenceClient("http://127.0.0.1:8080")


output = await client.text_generation("The huggingface_hub library is ", max_new_tokens=12, details=True)
print(output)

but It does not displays details,
TextGenerationOutput(generated_text='100% open-source and available on GitHub. It is distributed', details=None)

and server log is like

2024-05-10T09:32:15.955615Z  INFO compat_generate{default_return_full_text=true compute_type=Extension(ComputeType("4-nvidia-rtx-a6000"))}:generate{parameters=GenerateParameters { best_of: None, temperature: None, repetition_penalty: None, frequency_penalty: None, top_k: None, top_p: None, typical_p: None, do_sample: false, max_new_tokens: Some(12), return_full_text: Some(false), stop: [], truncate: None, watermark: false, details: false, decoder_input_details: false, seed: None, top_n_tokens: None, grammar: None } total_time="1.425314571s" validation_time="477.908µs" queue_time="66.966µs" inference_time="1.42476984s" time_per_token="118.73082ms" seed="None"}: text_generation_router::server: router/src/server.rs:309: Success

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

from huggingface_hub import AsyncInferenceClient

client = AsyncInferenceClient("http://127.0.0.1:8080")


output = await client.text_generation("The huggingface_hub library is ", max_new_tokens=12, details=True)
print(output)
2024-05-10T09:32:15.955615Z  INFO compat_generate{default_return_full_text=true compute_type=Extension(ComputeType("4-nvidia-rtx-a6000"))}:generate{parameters=GenerateParameters { best_of: None, temperature: None, repetition_penalty: None, frequency_penalty: None, top_k: None, top_p: None, typical_p: None, do_sample: false, max_new_tokens: Some(12), return_full_text: Some(false), stop: [], truncate: None, watermark: false, details: false, decoder_input_details: false, seed: None, top_n_tokens: None, grammar: None } total_time="1.425314571s" validation_time="477.908µs" queue_time="66.966µs" inference_time="1.42476984s" time_per_token="118.73082ms" seed="None"}: text_generation_router::server: router/src/server.rs:309: Success

Expected behavior

text generate should give details instead of None

@fxmarty
Copy link
Collaborator

fxmarty commented May 14, 2024

@uyeongkim I opened a similar issue at: huggingface/huggingface_hub#2281

Related issue for stream=True: #1530

Since you use stream=False, using simply requests instead of huggingface_hub should work for you:

import requests

session = requests.Session()


# url = "http://0.0.0.0:80/generate_stream"
url = "http://0.0.0.0:80/generate"
data = {"inputs": "Today I am in Paris and", "parameters": {"max_new_tokens": 20}}
headers = {"Content-Type": "application/json"}

response = requests.post(url, json=data, headers=headers)

response = session.post(
    url,
    json=data,
    headers=headers,
    stream=False, # True,
)

# for line in response.iter_lines():
#     print(f"line: `{line}`")

print(response.headers)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants