Added new config variable API_BASE_URL #477

TheR1D · 2024-02-12T00:55:44Z

Added new config variable API_BASE_URL.
Removed old OPENAI_BASE_URL.
Minor fixes in show_messages.

hrfried · 2024-02-19T18:55:19Z

Works brilliantly for me running ollama in a docker container with 0.0.0.0:11434->11434/tcp, :::11434->11434/tcp mapped. From the host running the docker container, API_BASE_URL=default finds it without issue, and from another device on the same LAN API_BASE_URL=http://<ipv4:port> finds it and similarly works without issue.

I saw you mention somewhere you wanted were looking for people to test, so just my confirmation. Probably gonna try it out with the listener endpoints in the text-generation-web-ui as well and can comment on that here.

Brilliant work. Been using sgpt for a while now and nice to slowly start moving to something fully locally hosted. :)

euroblaze · 2024-02-19T21:56:17Z

Been using sgpt for a while now and nice to slowly start moving to something fully locally hosted. :)

That is quite amazing @hrfried!
Are you planning to do RAG locally?
Question:
What hardware did you use to run your Ollama server?

Thanks @TheR1D for putting this amazing piece of software together!

My intension, and I'm in baby steps right now, just educating myself:

~~1. Get the Ollama Docker to run on a Hetzner VMs.~~

Experiment with the various LLM models.
See if it would be possible to put a REST-API before Ollama.
Query from our ERP software, for ex. from the Helpdesk module for ticket-responses.
Figure out tools and procedures of RAG, so as to improve the quality of generative outputs.

Sorry for digressing, but here seem to be likeminded folks.

Regards,
Ashant

hrfried · 2024-02-19T23:04:53Z

Currently just my main desktop which has a ryzen 9 7900x, nvidia 4060-Ti 16-gig, and 64 gigs of DDR5. I've gotten small models to run okay on less powerful hardware but they weren't really super performant. Mostly just using LLMs for productivity and exploring the space, training some LORAs on codebases for work to see what's feasible, etc. Nothing crazy really.

Not super familiar with RAG tbh--but ollama is pretty simple to use. I hadn't used it until today when I saw it was possible to set it as an endpoint in shell_gpt. I normally use other methods (e.g. textgen-webui) for running local LLMS. Just a docker pull and a docker run, honestly.

Don't know a whole lot about "true" cloud computing but I imagine you could run an nginx (or similar) reverse proxy into ollama with a docker-compose workflow and it'd be pretty simple, at least to set up a test-case. Not sure what kind of performance you'd get on shared servers though, especially if it's not GPU compute. Or if you're locked into the cloud provider's networking tools or anything. A little out of my wheelhouse, ha.

euroblaze · 2024-02-20T08:06:57Z

This repo/Issues should probably not be polluted with OTs, so I'll just conclude here by posting a few pointers to the various topics touched upon.

Currently just my main desktop which has a ryzen 9 7900x, nvidia 4060-Ti 16-gig, and 64 gigs of DDR5.

That looks pretty good!
Unfortunately I'm running on an Macbook Air, business device, and yet have to look into bare-metal options.

Still, took a blind shot at installing the Ollama Docker on a 2x vCPU, 4GB RAM on Debian latest stable.
The install and run was surprisingly smooth (used non-root /home user).
The outputs took forever to generate, like a few seconds per word!
And the first quick-test gave this output (after about an hour of compute, without any tuning/optimisations).

Been researching a few other topics, and here are some pointers for everyone's benefit:

Don't know a whole lot about "true" cloud computing but I imagine you could run an nginx (or similar) reverse proxy into ollama with a docker-compose workflow and it'd be pretty simple, at least to set up a test-case.

I'll probably stay away from cloud-computes, due to the prohibitive costs, especially as we scale towards production.
Tending to get dedicated machines from Hetzner (no affiliation), that recycle their pre-used machines, or shiny new ones.

Thanks @TheR1D and @hrfried for the great software and valuable inputs!

Ashant Chalasani

TheR1D added the bug Something isn't working label Feb 12, 2024

TheR1D self-assigned this Feb 12, 2024

TheR1D linked an issue Feb 12, 2024 that may be closed by this pull request

Unable to change OPENAI_BASE_URL in .sgptrc #473

Closed

TheR1D force-pushed the api-base-url branch 7 times, most recently from c17e7c5 to 85c204e Compare February 17, 2024 01:34

Added new config variable API_BASE_URL

39c62ed

TheR1D force-pushed the api-base-url branch from 85c204e to 39c62ed Compare February 17, 2024 01:50

TheR1D merged commit ecb7b26 into main Feb 17, 2024
3 checks passed

TheR1D deleted the api-base-url branch February 17, 2024 01:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added new config variable API_BASE_URL #477

Added new config variable API_BASE_URL #477

TheR1D commented Feb 12, 2024 •

edited

Loading

hrfried commented Feb 19, 2024

euroblaze commented Feb 19, 2024 •

edited

Loading

hrfried commented Feb 19, 2024

euroblaze commented Feb 20, 2024 •

edited

Loading

Added new config variable API_BASE_URL #477

Added new config variable API_BASE_URL #477

Conversation

TheR1D commented Feb 12, 2024 • edited Loading

hrfried commented Feb 19, 2024

euroblaze commented Feb 19, 2024 • edited Loading

hrfried commented Feb 19, 2024

euroblaze commented Feb 20, 2024 • edited Loading

TheR1D commented Feb 12, 2024 •

edited

Loading

euroblaze commented Feb 19, 2024 •

edited

Loading

euroblaze commented Feb 20, 2024 •

edited

Loading