Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

用openai的API库发送请求,返回HTTP400 #101

Closed
saber28 opened this issue Apr 16, 2024 · 3 comments
Closed

用openai的API库发送请求,返回HTTP400 #101

saber28 opened this issue Apr 16, 2024 · 3 comments

Comments

@saber28
Copy link

saber28 commented Apr 16, 2024

升级到v0.3.25后,用openai的API库发送请求,返回HTTP400。但在v0.3.20以下代码能正常工作
代码:

from openai import AsyncOpenAI
import asyncio

output_log = '00_Translate_to_Chinese.log'

client = AsyncOpenAI(base_url = "http://localhost:65530/api/oai", api_key="JUSTSECRET_KEY")

async def translate(t):
    text = ""
    stream = await client.completions.create(
        model="rwkv5-7b-v2",
        prompt="\nInstruction: Translate the input text into Chinese\n\nInput: " + t + "\n\nResponse:",
        top_p=0.1,
        frequency_penalty=1,
        stop=['\x00','\n\n','User:'],
        stream=True)
    async for chunk in stream:
        try:
            print(chunk.choices[0].delta['content'],end="",flush=True)
            text += chunk.choices[0].delta['content']
        except:
            break
    print('\n')
    with open(output_log,'a',encoding='utf-8') as f:
        f.write(text + '\n')

def main():
    while True:
        t = input()
        if t == "":
            continue
        with open(output_log,'a',encoding='utf-8') as f:
            f.write(t + '\n')
#        print('\n')
        asyncio.run(translate(t))

main()

POST请求:

Path:/api/oai/completions

Header:

Host: 127.0.0.1:65530
Accept-Encoding: gzip, deflate
Connection: keep-alive
Accept: application/json
Content-Type: application/json
User-Agent: AsyncOpenAI/Python 1.17.1
X-Stainless-Lang: python
X-Stainless-Package-Version: 1.17.1
X-Stainless-OS: Windows
X-Stainless-Arch: other:amd64
X-Stainless-Runtime: CPython
X-Stainless-Runtime-Version: 3.12.2
Authorization: Bearer JUSTSECRET_KEY
X-Stainless-Async: async:asyncio
Content-Length: 215

Payload:

{"model": "rwkv5-7b-v2", "prompt": "\\nInstruction: Translate the input text into Chinese\\n\\nInput: Hi\\n\\nResponse:", "frequency_penalty": 1, "stop": ["\\u0000", "\\n\\n", "User:"], "stream": true, "top_p": 0.1}

回应:

<!DOCTYPE html>
<html>
<head>
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width">
    <title>400: Bad Request</title>
    <style>
    :root {
        --bg-color: #fff;
        --text-color: #222;
    }
    body {
        background: var(--bg-color);
        color: var(--text-color);
        text-align: center;
    }
    pre { text-align: left; padding: 0 1rem; }
    footer{text-align:center;}
    @media (prefers-color-scheme: dark) {
        :root {
            --bg-color: #222;
            --text-color: #ddd;
        }
        a:link { color: red; }
        a:visited { color: #a8aeff; }
        a:hover {color: #a8aeff;}
        a:active {color: #a8aeff;}
    }
    </style>
</head>
<body>
    <div><h1>400: Bad Request</h1><h3>parse http data failed.</h3><pre>There is no more detailed explanation.</pre><hr><footer><a href="https://salvo.rs" target="_blank">salvo</a></footer></div>
</body>
</html>

配置文件:

[model]
embed_device = "Gpu"                                     # Device to put the embed tensor ("Cpu" or "Gpu").
head_chunk_size = 8192                                   # DO NOT modify this if you don't know what you are doing.
max_batch = 1                                           # The maximum batches that are cached on GPU.
max_runtime_batch = 1                                    # The maximum batches that can be scheduled for inference at the same time.
model_name = "rwkv5-7b-v2.st" # Name of the model.
model_path = "assets/models"                             # Path to the folder containing all models.
quant = 32                                                # Layers to be quantized.
quant_type = "NF4"                                      # Quantization type ("Int8" or "NF4").
state_chunk_size = 4                                     # The chunk size of layers in model state.
stop = ["\n\n"]                                          # Additional stop words in generation.
token_chunk_size = 128                                   # Size of token chunk that is inferred at once. For high end GPUs, this could be 64 or 128 (faster).
turbo = true                                             # Whether to use alternative GEMM kernel to speed-up long prompts.

[tokenizer]
path = "assets/tokenizer/rwkv_vocab_v20230424.json" # Path to the tokenizer.

[bnf]
enable_bytes_cache = true   # Enable the cache that accelerates the expansion of certain short schemas.
start_nonterminal = "start" # The initial nonterminal of the BNF schemas.

[adapter]
Auto = {}

[listen]
acme = false
domain = "local"
ip = "0.0.0.0"   # Use IpV4.
# ip = "::"        # Use IpV6.
force_pass = true
port = 65530
slot = "permisionkey"
tls = false

[[listen.app_keys]] # Allow mutiple app keys.
app_id = "JUSTAISERVER"
secret_key = "JUSTSECRET_KEY"

控制台信息

2024-04-16T14:44:59.204Z INFO  [ai00_server] reading config assets/configs/Config.toml...
2024-04-16T14:44:59.206Z INFO  [ai00_server::middleware] ModelInfo {
    version: V5,
    num_layer: 32,
    num_emb: 4096,
    num_hidden: 14336,
    num_vocab: 65536,
    num_head: 64,
}
2024-04-16T14:44:59.207Z INFO  [ai00_server::middleware] type: SafeTensors
2024-04-16T14:44:59.279Z INFO  [ai00_server] server started at 0.0.0.0:65530 without tls
2024-04-16T14:44:59.467Z INFO  [ai00_server::middleware] AdapterInfo {
    name: "NVIDIA GeForce RTX 4070 Ti",
    vendor: 4318,
    device: 10114,
    device_type: DiscreteGpu,
    driver: "NVIDIA",
    driver_info: "551.86",
    backend: Vulkan,
}
2024-04-16T14:45:09.367Z INFO  [ai00_server::middleware] model reloaded

@cgisky1980
Copy link
Member

cgisky1980 commented Apr 17, 2024

{
    "model": "rwkv5-7b-v2",
    "prompt": ["\\nInstruction: Translate the input text into Chinese\\n\\nInput: Hi\\n\\nResponse:"],
    "temperature":1,
    "frequency_penalty":0.3,
    "penalty_decay":0.9982686325973925,
    "stop": [
        "\\u0000",
        "\\n\\n",
        "User:"
    ],
    "stream": true,
    "top_p": 0.1
}

这个请求是OK的

@cgisky1980
Copy link
Member

一、prompt: 是String数组,二、temperature要有

@saber28 saber28 closed this as completed Apr 17, 2024
@saber28
Copy link
Author

saber28 commented Apr 17, 2024

prompt应当是字符串列表
Prompt should be list of str

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants