You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
from openai import AsyncOpenAI
import asyncio
output_log = '00_Translate_to_Chinese.log'
client = AsyncOpenAI(base_url = "http://localhost:65530/api/oai", api_key="JUSTSECRET_KEY")
async def translate(t):
text = ""
stream = await client.completions.create(
model="rwkv5-7b-v2",
prompt="\nInstruction: Translate the input text into Chinese\n\nInput: " + t + "\n\nResponse:",
top_p=0.1,
frequency_penalty=1,
stop=['\x00','\n\n','User:'],
stream=True)
async for chunk in stream:
try:
print(chunk.choices[0].delta['content'],end="",flush=True)
text += chunk.choices[0].delta['content']
except:
break
print('\n')
with open(output_log,'a',encoding='utf-8') as f:
f.write(text + '\n')
def main():
while True:
t = input()
if t == "":
continue
with open(output_log,'a',encoding='utf-8') as f:
f.write(t + '\n')
# print('\n')
asyncio.run(translate(t))
main()
[model]
embed_device = "Gpu" # Device to put the embed tensor ("Cpu" or "Gpu").
head_chunk_size = 8192 # DO NOT modify this if you don't know what you are doing.
max_batch = 1 # The maximum batches that are cached on GPU.
max_runtime_batch = 1 # The maximum batches that can be scheduled for inference at the same time.
model_name = "rwkv5-7b-v2.st" # Name of the model.
model_path = "assets/models" # Path to the folder containing all models.
quant = 32 # Layers to be quantized.
quant_type = "NF4" # Quantization type ("Int8" or "NF4").
state_chunk_size = 4 # The chunk size of layers in model state.
stop = ["\n\n"] # Additional stop words in generation.
token_chunk_size = 128 # Size of token chunk that is inferred at once. For high end GPUs, this could be 64 or 128 (faster).
turbo = true # Whether to use alternative GEMM kernel to speed-up long prompts.
[tokenizer]
path = "assets/tokenizer/rwkv_vocab_v20230424.json" # Path to the tokenizer.
[bnf]
enable_bytes_cache = true # Enable the cache that accelerates the expansion of certain short schemas.
start_nonterminal = "start" # The initial nonterminal of the BNF schemas.
[adapter]
Auto = {}
[listen]
acme = false
domain = "local"
ip = "0.0.0.0" # Use IpV4.
# ip = "::" # Use IpV6.
force_pass = true
port = 65530
slot = "permisionkey"
tls = false
[[listen.app_keys]] # Allow mutiple app keys.
app_id = "JUSTAISERVER"
secret_key = "JUSTSECRET_KEY"
控制台信息
2024-04-16T14:44:59.204Z INFO [ai00_server] reading config assets/configs/Config.toml...
2024-04-16T14:44:59.206Z INFO [ai00_server::middleware] ModelInfo {
version: V5,
num_layer: 32,
num_emb: 4096,
num_hidden: 14336,
num_vocab: 65536,
num_head: 64,
}
2024-04-16T14:44:59.207Z INFO [ai00_server::middleware] type: SafeTensors
2024-04-16T14:44:59.279Z INFO [ai00_server] server started at 0.0.0.0:65530 without tls
2024-04-16T14:44:59.467Z INFO [ai00_server::middleware] AdapterInfo {
name: "NVIDIA GeForce RTX 4070 Ti",
vendor: 4318,
device: 10114,
device_type: DiscreteGpu,
driver: "NVIDIA",
driver_info: "551.86",
backend: Vulkan,
}
2024-04-16T14:45:09.367Z INFO [ai00_server::middleware] model reloaded
The text was updated successfully, but these errors were encountered:
升级到v0.3.25后,用openai的API库发送请求,返回HTTP400。但在v0.3.20以下代码能正常工作
代码:
POST请求:
Path:/api/oai/completions
Header:
Payload:
回应:
配置文件:
控制台信息
The text was updated successfully, but these errors were encountered: