Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instructions for local models #6336

Open
MikeyBeez opened this issue Nov 21, 2023 · 20 comments
Open

Instructions for local models #6336

MikeyBeez opened this issue Nov 21, 2023 · 20 comments

Comments

@MikeyBeez
Copy link

Are there any instructions for using local models rather than GPT-3 or 4? Is there a way to set the basepath to 127.0.0.1:11435 to use ollama or to 1234/v2 for LM Studio? Is there a configuration file or environment variables to set for this? Thank you for sharing your wonderful software with the AI community.

Copy link

This issue has automatically been marked as stale because it has not had any activity in the last 50 days. You can unstale it by commenting or removing the label. Otherwise, this issue will be closed in 10 days.

@github-actions github-actions bot added the Stale label Jan 11, 2024
@msveshnikov
Copy link

Please, any news here?

@github-actions github-actions bot removed the Stale label Jan 15, 2024
@yf007
Copy link

yf007 commented Jan 24, 2024

I am also looking for a solution to this problem.

@Progaros
Copy link

I was trying to get ollama running with AutoGPT.

curl works:

curl http://localhost:11434/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "mistral:instruct",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Hello!"
            }
        ]
    }'
{"id":"chatcmpl-447","object":"chat.completion","created":1707528048,"model":"mistral:instruct","system_fingerprint":"fp_ollama","choices":[{"index":0,"message":{"role":"assistant","content":" Hello there! I'm here to help answer any questions you might have or assist with tasks you may need assistance with. What can I help you with today?\n\nHere are some things I can do:\n\n1. Answer general knowledge questions\n2. Help with math problems\n3. Set reminders and alarms\n4. Create to-do lists and manage tasks\n5. Provide weather updates\n6. Tell jokes or share interesting facts\n7. Assist with email and calendar management\n8. Play music, set timers for cooking, and more!\n\nLet me know what you need help with and I'll do my best to assist!"},"finish_reason":"stop"}],"usage":{"prompt_tokens":16,"completion_tokens":140,"total_tokens":156}}

but with this AutoGPT config:

## OPENAI_API_KEY - OpenAI API Key (Example: my-openai-api-key)
OPENAI_API_KEY=ollama

## OPENAI_API_BASE_URL - Custom url for the OpenAI API, useful for connecting to custom backends. No effect if USE_AZURE is true, leave blank to keep the default url
# the following is an example:
OPENAI_API_BASE_URL= http://localhost:11434/v1/chat/completions

## SMART_LLM - Smart language model (Default: gpt-4-0314)
SMART_LLM=mixtral:8x7b-instruct-v0.1-q2_K

## FAST_LLM - Fast language model (Default: gpt-3.5-turbo-16k)
FAST_LLM=mistral:instruct

I can't get the connection:

File "/venv/agpt-9TtSrW0h-py3.10/lib/python3.10/site-packages/openai/_base_client.py", line 919, in _request
    raise APIConnectionError(request=request) from err
openai.APIConnectionError: Connection error.

maybe someone will figure it out and can post an update here

@msveshnikov
Copy link

Connection is solvable via proxy, but then you will get pydantic errors everywhere because Mistral is producing wrong json

@qwertyuu
Copy link

If an openai-ready api is needed, I think you can go through litellm to make a bridge to your ollama instance: https://docs.litellm.ai/docs/providers/ollama

@ketsapiwiq
Copy link

Hi!
I am still fighting with ollama to try proxying an agent on my own, but one important thing I want to mention is regarding this:

Connection is solvable via proxy, but then you will get pydantic errors everywhere because Mistral is producing wrong json

Can't we theoretically code an agent that uses GBNF grammar files for forcing Mistral or other local LLMs to produce correct JSON?

A simple example for correct JSON is viewable in the llama.cpp repo: https://github.com/ggerganov/llama.cpp/blob/master/grammars/json.gbnf
Then you include the correct GBNF in your llama.cpp command (I figure it would be a problem if the Ollama API doesn't support it though).

There are even programs now that generate correct GBNF files based on JSON definitions: https://github.com/richardanaya/gbnf

@ShrirajHegde
Copy link

ShrirajHegde commented Feb 24, 2024

If an openai-ready api is needed, I think you can go through litellm to make a bridge to your ollama instance: https://docs.litellm.ai/docs/providers/ollama

@qwertyuu, I thought ollama supports OpenAI compatible API without LiteLLM (https://ollama.com/blog/openai-compatibility). Am I missing something?

@Wladastic
Copy link
Contributor

I got it to run with Mistral 7B AWQ, neural chat v3 AWQ and a few other models.
Only thing is I had to write my own Auto-GPT from scratch as the prompts from Auto-GPT are too long and confusing for the local llms.
They return correct prompts sometimes, but other times they concentrate so much on the system Prompt by Auto-GPT that they respond with "Hello, I am using the command ask_user to talk to the user, is this correct?" and then it says "Hello, how can I help you?" like 100 times until I cancel it.

My current use case of using oobabooga text-generation-webui works best when I add the JSON grammar to it. It then works with very basic prompts only and only a few commands, otherwise it kept making up new commands and started halucinating and responding with multiple commands at once etc.

@k8si
Copy link

k8si commented Feb 29, 2024

I got it to make calls to a llamafile server running locally (which has an OpenAI-compatible API) by just setting OPENAI_API_BASE_URL=http://localhost:8080/v1 in my .env. I know the requests are getting through based on the debug logs (plus I can see the calls coming into my llamafile server).

However, since the model I'm using doesn't support function calling, the json it returns has null for the tool_calls field which results in ValueError: LLM did not call create_agent function; agent profile creation failed coming from here:

Also, setting OPENAI_FUNCTIONS=False does not seem to do anything.

If anyone knows of an open source gguf or lllamafile-format model that supports function calling, let me know. That might fix this issue?

@Wladastic
Copy link
Contributor

Well instead of using OPENAI API use one of the numerous API plugins or check the OPENAI Gpt base plugin in the code.
I havent got any local model to fully work with Auto-GPT as GPT-4 can hold the context length without getting too focused on it, but other models that work do focus too much on the prompt given to the llm then.
Mistral for example keeps talking about the constraints that it gets and that it tries to oblige to them etc.
I am currently trying to build something similar to this project that uses multiple agent calls for each step to somehow accommodate the lack of context, but it is a bit slow as sometimes an agent gets very stubborn on their point of view.

@cognitivetech
Copy link

https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B

Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced Function Calling and JSON Mode dataset developed in-house.

@Wladastic
Copy link
Contributor

https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B

Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced Function Calling and JSON Mode dataset developed in-house.

Hermes 2 Pro works well but I would rather wait for another Version of this based on Mistral 7B v.0.2 as Hermes 2 Pro is based on v0.1 which is only trained on 8k context length and v0.2 is trained on 32k.

I also think capybara Hermes 2.5 Q8_0 works very well for me, only that it sometimes doesnt understand why a JSON was wrong.
Maybe some other LLM would come along that is cleaner than Mistral 7B Instruct v.0.2 as that version is horrible to use currently.

Also set n_batch to 1024 at least, or 2048, this way Auto-GPT runs best so far. Not on par with GPT-3.5-Turbo but it works kindof.
The function calling could be implemented from here though: https://github.com/NousResearch/Hermes-Function-Calling

@qwertyuu
Copy link

qwertyuu commented Apr 4, 2024

If an openai-ready api is needed, I think you can go through litellm to make a bridge to your ollama instance: https://docs.litellm.ai/docs/providers/ollama

@qwertyuu, I thought ollama supports OpenAI compatible API without LiteLLM (https://ollama.com/blog/openai-compatibility). Am I missing something?

Damn! Good to know.

@ketsapiwiq
Copy link

ketsapiwiq commented Apr 5, 2024

A bit off-topic but this project gained a lot of traction lately and works with Hermes Pro or Mistral/Mixtral, it doesn't have many agents yet (web search, main planning loop, and RAG) but it works super well, maybe interesting to study: https://github.com/nilsherzig/LLocalSearch

@cognitivetech
Copy link

savage

@ZhenhuiTang
Copy link

If an openai-ready api is needed, I think you can go through litellm to make a bridge to your ollama instance: https://docs.litellm.ai/docs/providers/ollama

@qwertyuu, I thought ollama supports OpenAI compatible API without LiteLLM (https://ollama.com/blog/openai-compatibility). Am I missing something?

Damn! Good to know.

Have you been using local LLMs with the mentioned compatible API successfully?

@k8si k8si mentioned this issue Apr 19, 2024
9 tasks
@Docteur-RS
Copy link

Now that ollama is openai compatible we should be able to trick autoGPT by setting OPENAI_API_BASE_URL=http://localhost:11434/v1.
Unfortunatly there are still 2 issues here:

  • The model name has to be something that is an existing proprietary model string like "gpt4-turbo" or whatever. So using "mistral:latest" isn't working.
  • Faking the api-key seems to to hurt autoGPT. I'm not sure what it's checking but "hello world" as API key won't fly.

@ntindle
Copy link
Member

ntindle commented May 19, 2024

Should be pretty simple to add a new provider for ollama by copy pasting open ai and removing parts not needed

@Docteur-RS
Copy link

Should be pretty simple to add a new provider for ollama by copy pasting open ai and removing parts not needed

Huuum. I don't even know where is the provider file located.

But let's pretend I could duplicate the provider. Is it really worth it ? I can't find anywhere in the documentation any tips about running tools with local models. And honnestly tool calling is a real must have to achieve anything !

I just feel like autoGPT isn't oriented toward local models support anyway. Considering alternatives like CrewAI and Autogen which both have documentation and local tool calling support might be a better choice for the moment.
I feel like autoGPT is a bit like langgraph. It has an ollama plugin but the ollama tool calling is outdated and never got out of beta. It doesn't feel safe to invest time in this one right now IMO.
All I can read everywhere is OPENAI OPENAI OPENAI OPENAI...

I hope that this project gets better support for running local models soon. It seems nice ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests