text generation details not working when stream=False #1876

uyeongkim · 2024-05-10T09:38:51Z

System Info

I ran docker with model-id with downloaded lamma3 model, from huggingface.
And I requested with python code below

from huggingface_hub import AsyncInferenceClient

client = AsyncInferenceClient("http://127.0.0.1:8080")


output = await client.text_generation("The huggingface_hub library is ", max_new_tokens=12, details=True)
print(output)

but It does not displays details,
TextGenerationOutput(generated_text='100% open-source and available on GitHub. It is distributed', details=None)

and server log is like

2024-05-10T09:32:15.955615Z  INFO compat_generate{default_return_full_text=true compute_type=Extension(ComputeType("4-nvidia-rtx-a6000"))}:generate{parameters=GenerateParameters { best_of: None, temperature: None, repetition_penalty: None, frequency_penalty: None, top_k: None, top_p: None, typical_p: None, do_sample: false, max_new_tokens: Some(12), return_full_text: Some(false), stop: [], truncate: None, watermark: false, details: false, decoder_input_details: false, seed: None, top_n_tokens: None, grammar: None } total_time="1.425314571s" validation_time="477.908µs" queue_time="66.966µs" inference_time="1.42476984s" time_per_token="118.73082ms" seed="None"}: text_generation_router::server: router/src/server.rs:309: Success

Information

Docker
The CLI directly

Tasks

An officially supported command
My own modifications

Reproduction

from huggingface_hub import AsyncInferenceClient

client = AsyncInferenceClient("http://127.0.0.1:8080")


output = await client.text_generation("The huggingface_hub library is ", max_new_tokens=12, details=True)
print(output)

2024-05-10T09:32:15.955615Z  INFO compat_generate{default_return_full_text=true compute_type=Extension(ComputeType("4-nvidia-rtx-a6000"))}:generate{parameters=GenerateParameters { best_of: None, temperature: None, repetition_penalty: None, frequency_penalty: None, top_k: None, top_p: None, typical_p: None, do_sample: false, max_new_tokens: Some(12), return_full_text: Some(false), stop: [], truncate: None, watermark: false, details: false, decoder_input_details: false, seed: None, top_n_tokens: None, grammar: None } total_time="1.425314571s" validation_time="477.908µs" queue_time="66.966µs" inference_time="1.42476984s" time_per_token="118.73082ms" seed="None"}: text_generation_router::server: router/src/server.rs:309: Success

Expected behavior

text generate should give details instead of None

fxmarty · 2024-05-14T13:33:08Z

@uyeongkim I opened a similar issue at: huggingface/huggingface_hub#2281

Related issue for stream=True: #1530

Since you use stream=False, using simply requests instead of huggingface_hub should work for you:

import requests

session = requests.Session()


# url = "http://0.0.0.0:80/generate_stream"
url = "http://0.0.0.0:80/generate"
data = {"inputs": "Today I am in Paris and", "parameters": {"max_new_tokens": 20}}
headers = {"Content-Type": "application/json"}

response = requests.post(url, json=data, headers=headers)

response = session.post(
    url,
    json=data,
    headers=headers,
    stream=False, # True,
)

# for line in response.iter_lines():
#     print(f"line: `{line}`")

print(response.headers)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

text generation details not working when stream=False #1876

text generation details not working when stream=False #1876

uyeongkim commented May 10, 2024

fxmarty commented May 14, 2024

text generation details not working when stream=False #1876

text generation details not working when stream=False #1876

Comments

uyeongkim commented May 10, 2024

System Info

Information

Tasks

Reproduction

Expected behavior

fxmarty commented May 14, 2024