-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
After deployed ui-tars-2B model with Ollama locally according to GUI Model Deployment Guide, I run python script and got garbled text like:
and,: , and orc Uran is, c Australia? C : ? True ,否则.
It seems this problem will also cause the output 0 problem when using ui-tars-desktop according to the log from app:
[2025-01-23 16:02:56.695] [info] (main) [vlmParams_images_len]: 1
[2025-01-23 16:02:56.697] [info] (main) [resizeFactor] maxPixels 1058400 currentPixels 1821369 resizeFactor 0.7623000448470658
[2025-01-23 16:02:56.846] [info] (main) [preprocessResizeImage] width: 1301 height: 813 size: 62.60KB
[2025-01-23 16:02:56.847] [info] (main) vlmBaseUrl http://localhost:11434/v1 vlmApiKey ollama
[2025-01-23 16:03:12.460] [info] (main) [vlm_invoke_time_cost]: 15612ms
[2025-01-23 16:03:12.460] [info] (main) [ui_tars_vlm_response_content] 懒陷入了, on right onceberry, ...(Omission)
[2025-01-23 16:03:12.460] [info] (main) [nl2Command] body {"prediction":" 懒陷入了, on right onceberry, ...(Omission)
[2025-01-23 16:03:12.461] [info] (main) [nl2Command] parsed []
[2025-01-23 16:03:12.461] [info] (main) [emitData] status running
[2025-01-23 16:03:12.461] [info] (main) ======data======
[] { size: { width: 1707, height: 1067 } } {
from: 'gpt',
value: '懒陷入了, on right onceberry, this you right right him right,, right,趁更何况,, right when,你耻, ...(Omission)
timing: { start: 1737619376694, end: 1737619392461, cost: 15767 },
reflections: []
}
========
[2025-01-23 16:03:12.462] [info] (main) [parsed] [] [parsed_length] 0
and this is the python script I used:
import base64
from openai import OpenAI
deployment = "ollama"
instruction = "click the start menu"
screenshot_path = r"C:\Gianmeng\Code\Mass\UI-TARS\Screenshots\screenshot.jpg"
assert deployment in ["ollama", "hf"]
if deployment == "ollama":
client = OpenAI(
base_url="http://127.0.0.1:11434/v1/",
api_key="ollama", # not used
)
# the model name created via ollama CLI, you can check it via command: `ollama list`
model = "ui-tars"
else:
client = OpenAI(base_url="<endpoint url>", api_key="<huggingface access tokens>")
model = "tgi"
prompt = "You are a GUI agent. You are given a task and your action history, with screenshots. You need to perform the next action to complete the task. \n\n ## Output Format\n ```\n Action_Summary: ...\n Action: ...\n ```\n\n ## Action Space\n click(start_box=‘<|box_start|>(x1,y1)<|box_end|>’)\nlong_press(start_box=‘<|box_start|>(x1,y1)<|box_end|>’, time=‘’)\ntype(content=‘’)\nscroll(direction=‘down or up or right or left’)\nopen_app(app_name=‘’)\nnavigate_back()\nnavigate_home()\nWAIT()\nfinished() # Submit the task regardless of whether it succeeds or fails.\n\n ## Note\n - Use English in `Action_Summary` part.\n \n\n ## User Instruction\n"
with open(screenshot_path, "rb") as image_file:
encoded_string = base64.b64encode(image_file.read()).decode("utf-8")
response = client.chat.completions.create(
model=model,
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": prompt + instruction},
{"type": "image_url", "image_url": {"url": f"data:image/jpg;base64,{encoded_string}"}},
],
},
],
)
print(response.choices[0].message.content)依据GUI Model Deployment Guide,我在本地使用Ollama部署了ui-tars-2B,但是在进行第3步进行验证的时候,发现python脚本的输出都是乱码。可以从ui-tars-desktop的log中看出,这个问题好像会导致运行ui-tars-desktop的时候会输出0。