Instructions for local models #6336

MikeyBeez · 2023-11-21T17:58:11Z

Are there any instructions for using local models rather than GPT-3 or 4? Is there a way to set the basepath to 127.0.0.1:11435 to use ollama or to 1234/v2 for LM Studio? Is there a configuration file or environment variables to set for this? Thank you for sharing your wonderful software with the AI community.

github-actions · 2024-01-11T01:47:28Z

This issue has automatically been marked as stale because it has not had any activity in the last 50 days. You can unstale it by commenting or removing the label. Otherwise, this issue will be closed in 10 days.

msveshnikov · 2024-01-14T17:08:48Z

Please, any news here?

yf007 · 2024-01-24T22:25:36Z

I am also looking for a solution to this problem.

Progaros · 2024-02-10T01:32:42Z

I was trying to get ollama running with AutoGPT.

curl works:

curl http://localhost:11434/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "mistral:instruct",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Hello!"
            }
        ]
    }'
{"id":"chatcmpl-447","object":"chat.completion","created":1707528048,"model":"mistral:instruct","system_fingerprint":"fp_ollama","choices":[{"index":0,"message":{"role":"assistant","content":" Hello there! I'm here to help answer any questions you might have or assist with tasks you may need assistance with. What can I help you with today?\n\nHere are some things I can do:\n\n1. Answer general knowledge questions\n2. Help with math problems\n3. Set reminders and alarms\n4. Create to-do lists and manage tasks\n5. Provide weather updates\n6. Tell jokes or share interesting facts\n7. Assist with email and calendar management\n8. Play music, set timers for cooking, and more!\n\nLet me know what you need help with and I'll do my best to assist!"},"finish_reason":"stop"}],"usage":{"prompt_tokens":16,"completion_tokens":140,"total_tokens":156}}

but with this AutoGPT config:

## OPENAI_API_KEY - OpenAI API Key (Example: my-openai-api-key)
OPENAI_API_KEY=ollama

## OPENAI_API_BASE_URL - Custom url for the OpenAI API, useful for connecting to custom backends. No effect if USE_AZURE is true, leave blank to keep the default url
# the following is an example:
OPENAI_API_BASE_URL= http://localhost:11434/v1/chat/completions

## SMART_LLM - Smart language model (Default: gpt-4-0314)
SMART_LLM=mixtral:8x7b-instruct-v0.1-q2_K

## FAST_LLM - Fast language model (Default: gpt-3.5-turbo-16k)
FAST_LLM=mistral:instruct

I can't get the connection:

File "/venv/agpt-9TtSrW0h-py3.10/lib/python3.10/site-packages/openai/_base_client.py", line 919, in _request
    raise APIConnectionError(request=request) from err
openai.APIConnectionError: Connection error.

maybe someone will figure it out and can post an update here

msveshnikov · 2024-02-10T12:56:22Z

Connection is solvable via proxy, but then you will get pydantic errors everywhere because Mistral is producing wrong json

qwertyuu · 2024-02-14T02:12:54Z

If an openai-ready api is needed, I think you can go through litellm to make a bridge to your ollama instance: https://docs.litellm.ai/docs/providers/ollama

ketsapiwiq · 2024-02-24T10:22:46Z

Hi!
I am still fighting with ollama to try proxying an agent on my own, but one important thing I want to mention is regarding this:

Connection is solvable via proxy, but then you will get pydantic errors everywhere because Mistral is producing wrong json

Can't we theoretically code an agent that uses GBNF grammar files for forcing Mistral or other local LLMs to produce correct JSON?

A simple example for correct JSON is viewable in the llama.cpp repo: https://github.com/ggerganov/llama.cpp/blob/master/grammars/json.gbnf
Then you include the correct GBNF in your llama.cpp command (I figure it would be a problem if the Ollama API doesn't support it though).

There are even programs now that generate correct GBNF files based on JSON definitions: https://github.com/richardanaya/gbnf

ShrirajHegde · 2024-02-24T10:53:51Z

If an openai-ready api is needed, I think you can go through litellm to make a bridge to your ollama instance: https://docs.litellm.ai/docs/providers/ollama

@qwertyuu, I thought ollama supports OpenAI compatible API without LiteLLM (https://ollama.com/blog/openai-compatibility). Am I missing something?

Wladastic · 2024-02-26T12:30:50Z

I got it to run with Mistral 7B AWQ, neural chat v3 AWQ and a few other models.
Only thing is I had to write my own Auto-GPT from scratch as the prompts from Auto-GPT are too long and confusing for the local llms.
They return correct prompts sometimes, but other times they concentrate so much on the system Prompt by Auto-GPT that they respond with "Hello, I am using the command ask_user to talk to the user, is this correct?" and then it says "Hello, how can I help you?" like 100 times until I cancel it.

My current use case of using oobabooga text-generation-webui works best when I add the JSON grammar to it. It then works with very basic prompts only and only a few commands, otherwise it kept making up new commands and started halucinating and responding with multiple commands at once etc.

k8si · 2024-02-29T17:05:30Z

I got it to make calls to a llamafile server running locally (which has an OpenAI-compatible API) by just setting OPENAI_API_BASE_URL=http://localhost:8080/v1 in my .env. I know the requests are getting through based on the debug logs (plus I can see the calls coming into my llamafile server).

However, since the model I'm using doesn't support function calling, the json it returns has null for the tool_calls field which results in ValueError: LLM did not call create_agent function; agent profile creation failed coming from here:

AutoGPT/autogpts/autogpt/autogpt/agent_factory/profile_generator.py

Line 202 in a9b7b17

raise ValueError(

Also, setting OPENAI_FUNCTIONS=False does not seem to do anything.

If anyone knows of an open source gguf or lllamafile-format model that supports function calling, let me know. That might fix this issue?

Wladastic · 2024-02-29T17:11:17Z

Well instead of using OPENAI API use one of the numerous API plugins or check the OPENAI Gpt base plugin in the code.
I havent got any local model to fully work with Auto-GPT as GPT-4 can hold the context length without getting too focused on it, but other models that work do focus too much on the prompt given to the llm then.
Mistral for example keeps talking about the constraints that it gets and that it tries to oblige to them etc.
I am currently trying to build something similar to this project that uses multiple agent calls for each step to somehow accommodate the lack of context, but it is a bit slow as sometimes an agent gets very stubborn on their point of view.

cognitivetech · 2024-04-01T08:00:14Z

https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B

Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced Function Calling and JSON Mode dataset developed in-house.

Wladastic · 2024-04-03T14:36:04Z

https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B

Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced Function Calling and JSON Mode dataset developed in-house.

Hermes 2 Pro works well but I would rather wait for another Version of this based on Mistral 7B v.0.2 as Hermes 2 Pro is based on v0.1 which is only trained on 8k context length and v0.2 is trained on 32k.

I also think capybara Hermes 2.5 Q8_0 works very well for me, only that it sometimes doesnt understand why a JSON was wrong.
Maybe some other LLM would come along that is cleaner than Mistral 7B Instruct v.0.2 as that version is horrible to use currently.

Also set n_batch to 1024 at least, or 2048, this way Auto-GPT runs best so far. Not on par with GPT-3.5-Turbo but it works kindof.
The function calling could be implemented from here though: https://github.com/NousResearch/Hermes-Function-Calling

qwertyuu · 2024-04-04T23:11:36Z

If an openai-ready api is needed, I think you can go through litellm to make a bridge to your ollama instance: https://docs.litellm.ai/docs/providers/ollama

@qwertyuu, I thought ollama supports OpenAI compatible API without LiteLLM (https://ollama.com/blog/openai-compatibility). Am I missing something?

Damn! Good to know.

ketsapiwiq · 2024-04-05T11:40:39Z

A bit off-topic but this project gained a lot of traction lately and works with Hermes Pro or Mistral/Mixtral, it doesn't have many agents yet (web search, main planning loop, and RAG) but it works super well, maybe interesting to study: https://github.com/nilsherzig/LLocalSearch

cognitivetech · 2024-04-05T20:58:42Z

savage

ZhenhuiTang · 2024-04-10T06:30:37Z

If an openai-ready api is needed, I think you can go through litellm to make a bridge to your ollama instance: https://docs.litellm.ai/docs/providers/ollama

@qwertyuu, I thought ollama supports OpenAI compatible API without LiteLLM (https://ollama.com/blog/openai-compatibility). Am I missing something?

Damn! Good to know.

Have you been using local LLMs with the mentioned compatible API successfully？

Docteur-RS · 2024-05-19T14:08:17Z

Now that ollama is openai compatible we should be able to trick autoGPT by setting OPENAI_API_BASE_URL=http://localhost:11434/v1.
Unfortunatly there are still 2 issues here:

The model name has to be something that is an existing proprietary model string like "gpt4-turbo" or whatever. So using "mistral:latest" isn't working.
Faking the api-key seems to to hurt autoGPT. I'm not sure what it's checking but "hello world" as API key won't fly.

ntindle · 2024-05-19T14:13:09Z

Should be pretty simple to add a new provider for ollama by copy pasting open ai and removing parts not needed

Docteur-RS · 2024-05-19T14:29:02Z

Should be pretty simple to add a new provider for ollama by copy pasting open ai and removing parts not needed

Huuum. I don't even know where is the provider file located.

But let's pretend I could duplicate the provider. Is it really worth it ? I can't find anywhere in the documentation any tips about running tools with local models. And honnestly tool calling is a real must have to achieve anything !

I just feel like autoGPT isn't oriented toward local models support anyway. Considering alternatives like CrewAI and Autogen which both have documentation and local tool calling support might be a better choice for the moment.
I feel like autoGPT is a bit like langgraph. It has an ollama plugin but the ollama tool calling is outdated and never got out of beta. It doesn't feel safe to invest time in this one right now IMO.
All I can read everywhere is OPENAI OPENAI OPENAI OPENAI...

I hope that this project gets better support for running local models soon. It seems nice ;-)

github-actions bot added the Stale label Jan 11, 2024

github-actions bot removed the Stale label Jan 15, 2024

k8si mentioned this issue Feb 29, 2024

Make it easier to use local LLMs by decoupling AutoGPT from dependence on OpenAI function calling #6947

Open

1 task

k8si mentioned this issue Apr 19, 2024

Draft llamafile support #7091

Draft

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Instructions for local models #6336

Instructions for local models #6336

MikeyBeez commented Nov 21, 2023

github-actions bot commented Jan 11, 2024

msveshnikov commented Jan 14, 2024

yf007 commented Jan 24, 2024

Progaros commented Feb 10, 2024

msveshnikov commented Feb 10, 2024

qwertyuu commented Feb 14, 2024

ketsapiwiq commented Feb 24, 2024

ShrirajHegde commented Feb 24, 2024 •

edited

Wladastic commented Feb 26, 2024

k8si commented Feb 29, 2024

Wladastic commented Feb 29, 2024

cognitivetech commented Apr 1, 2024

Wladastic commented Apr 3, 2024

qwertyuu commented Apr 4, 2024

ketsapiwiq commented Apr 5, 2024 •

edited

cognitivetech commented Apr 5, 2024

ZhenhuiTang commented Apr 10, 2024

Docteur-RS commented May 19, 2024

ntindle commented May 19, 2024

Docteur-RS commented May 19, 2024

Instructions for local models #6336

Instructions for local models #6336

Comments

MikeyBeez commented Nov 21, 2023

github-actions bot commented Jan 11, 2024

msveshnikov commented Jan 14, 2024

yf007 commented Jan 24, 2024

Progaros commented Feb 10, 2024

msveshnikov commented Feb 10, 2024

qwertyuu commented Feb 14, 2024

ketsapiwiq commented Feb 24, 2024

ShrirajHegde commented Feb 24, 2024 • edited

Wladastic commented Feb 26, 2024

k8si commented Feb 29, 2024

Wladastic commented Feb 29, 2024

cognitivetech commented Apr 1, 2024

Wladastic commented Apr 3, 2024

qwertyuu commented Apr 4, 2024

ketsapiwiq commented Apr 5, 2024 • edited

cognitivetech commented Apr 5, 2024

ZhenhuiTang commented Apr 10, 2024

Docteur-RS commented May 19, 2024

ntindle commented May 19, 2024

Docteur-RS commented May 19, 2024

ShrirajHegde commented Feb 24, 2024 •

edited

ketsapiwiq commented Apr 5, 2024 •

edited