-
-
Notifications
You must be signed in to change notification settings - Fork 8.8k
Description
Your current environment
Python 3.9.21
pydantic-ai 0.3.0
vllm 0.9.0.1
deploy code:
CUDA_VISIBLE_DEVICES=3 nohup python -m vllm.entrypoints.openai.api_server
--model /data/ckpt/Qwen/Qwen2.5-14B-Instruct
--tensor-parallel-size 1
--max-model-len 16384
--port 7509
--gpu-memory-utilization 0.95
--disable-log-stats
--served-model-name qwen2.5-14b-instruct
--max-num-batched-tokens 100000
--max-num-seqs 1500
--enable-prefix-caching
--tokenizer-pool-size=32
--enable-auto-tool-choice
--tool-call-parser hermes
--trust-remote-code
🐛 Describe the bug
After receiving the request, the service froze,Long term no response, new requests cannot be returned
server log:
INFO 06-18 16:48:22 [logger.py:42] Received request chatcmpl-b2fba3c87bfb4896bf83656515c8ecca: prompt: '<|im_start|>system\nplease extract the user profile information from the following text. The output should be a JSON object with the keys "name", "dob" (date of birth), and "bio" (a short biography). If any information is not available, leave that key out of the JSON object.\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within XML tags:\n\n{"type": "function", "function": {"name": "final_result", "description": "The final response which ends this conversation", "parameters": {"additionalProperties": false, "properties": {"name": {"type": "string"}, "dob": {"format": "date", "type": "string"}, "bio": {"type": "string"}}, "type": "object"}}}\n\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{"name": , "arguments": }\n</tool_call><|im_end|>\n<|im_start|>user\nMy name is Ben, I was born on January 28th 1990, I like the chain the dog and the pyramid.<|im_end|>\n<|im_start|>assistant\n', params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.05, temperature=0.7, top_p=0.8, top_k=20, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=16126, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=GuidedDecodingParams(json={'type': 'array', 'minItems': 1, 'items': {'type': 'object', 'anyOf': [{'properties': {'name': {'type': 'string', 'enum': ['final_result']}, 'parameters': {'additionalProperties': False, 'properties': {'name': {'type': 'string'}, 'dob': {'format': 'date', 'type': 'string'}, 'bio': {'type': 'string'}}, 'type': 'object'}}, 'required': ['name', 'parameters']}]}}, regex=None, choice=None, grammar=None, json_object=None, backend=None, backend_was_auto=False, disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, whitespace_pattern=None, structural_tag=None), extra_args=None), prompt_token_ids: None, prompt_embeds shape: None, lora_request: None, prompt_adapter_request: None.
INFO: 10.70.110.222:56702 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO 06-18 16:48:22 [async_llm.py:261] Added request chatcmpl-b2fba3c87bfb4896bf83656515c8ecca.
test-code:
`from datetime import date
from typing import Dict, List
from loguru import logger
from pydantic_ai import Agent, Tool
from pydantic_ai.models.openai import OpenAIModel
from pydantic_ai.providers.openai import OpenAIProvider
from pydantic import ValidationError
from typing_extensions import TypedDict
from pydantic_ai import Agent
class UserProfile(TypedDict, total=False):
name: str
dob: date
bio: str
class Chatbot:
def init(self):
self.model = OpenAIModel(
model_name="qwen2.5-14b-instruct",
provider=OpenAIProvider(
base_url="http://0.0.0.0:7509/v1/",
api_key="password"
),
)
self.agent = Agent(
model=self.model,
system_prompt='please extract the user profile information from the following text. The output should be a JSON object with the keys "name", "dob" (date of birth), and "bio" (a short biography). If any information is not available, leave that key out of the JSON object.',
output_type=UserProfile
)
async def main():
chatbot = Chatbot()
user_input = 'My name is Ben, I was born on January 28th 1990, I like the chain the dog and the pyramid.'
# async with chatbot.agent.run_stream(user_prompt=user_input) as response:
# async for chunk in response.stream_structured():
# print(chunk, end='', flush=True)
async with chatbot.agent.run_stream(user_input) as result:
async for message, last in result.stream_structured():
# print(last, message)
try:
profile = await result.validate_structured_output(
message,
allow_partial=not last,
)
except ValidationError:
continue
print(profile)
#> {'name': 'Ben'}
#> {'name': 'Ben'}
#> {'name': 'Ben', 'dob': date(1990, 1, 28), 'bio': 'Likes'}
#> {'name': 'Ben', 'dob': date(1990, 1, 28), 'bio': 'Likes the chain the '}
#> {'name': 'Ben', 'dob': date(1990, 1, 28), 'bio': 'Likes the chain the dog and the pyr'}
#> {'name': 'Ben', 'dob': date(1990, 1, 28), 'bio': 'Likes the chain the dog and the pyramid'}
#> {'name': 'Ben', 'dob': date(1990, 1, 28), 'bio': 'Likes the chain the dog and the pyramid'}
if name == "main":
import asyncio
asyncio.run(main())`
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.