Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added new config variable API_BASE_URL #477

Merged
merged 1 commit into from
Feb 17, 2024
Merged

Added new config variable API_BASE_URL #477

merged 1 commit into from
Feb 17, 2024

Conversation

TheR1D
Copy link
Owner

@TheR1D TheR1D commented Feb 12, 2024

  • Added new config variable API_BASE_URL.
  • Removed old OPENAI_BASE_URL.
  • Minor fixes in show_messages.

@TheR1D TheR1D added the bug Something isn't working label Feb 12, 2024
@TheR1D TheR1D self-assigned this Feb 12, 2024
@TheR1D TheR1D linked an issue Feb 12, 2024 that may be closed by this pull request
@TheR1D TheR1D force-pushed the api-base-url branch 7 times, most recently from c17e7c5 to 85c204e Compare February 17, 2024 01:34
@TheR1D TheR1D merged commit ecb7b26 into main Feb 17, 2024
3 checks passed
@TheR1D TheR1D deleted the api-base-url branch February 17, 2024 01:58
@hrfried
Copy link

hrfried commented Feb 19, 2024

Works brilliantly for me running ollama in a docker container with 0.0.0.0:11434->11434/tcp, :::11434->11434/tcp mapped. From the host running the docker container, API_BASE_URL=default finds it without issue, and from another device on the same LAN API_BASE_URL=http://<ipv4:port> finds it and similarly works without issue.

I saw you mention somewhere you wanted were looking for people to test, so just my confirmation. Probably gonna try it out with the listener endpoints in the text-generation-web-ui as well and can comment on that here.

Brilliant work. Been using sgpt for a while now and nice to slowly start moving to something fully locally hosted. :)

@euroblaze
Copy link

euroblaze commented Feb 19, 2024

Been using sgpt for a while now and nice to slowly start moving to something fully locally hosted. :)

That is quite amazing @hrfried!
Are you planning to do RAG locally?
Question:
What hardware did you use to run your Ollama server?

Thanks @TheR1D for putting this amazing piece of software together!

My intension, and I'm in baby steps right now, just educating myself:

1. Get the Ollama Docker to run on a Hetzner VMs.

  1. Experiment with the various LLM models.

  2. See if it would be possible to put a REST-API before Ollama.

  3. Query from our ERP software, for ex. from the Helpdesk module for ticket-responses.

  4. Figure out tools and procedures of RAG, so as to improve the quality of generative outputs.

Sorry for digressing, but here seem to be likeminded folks.

Regards,
Ashant

@hrfried
Copy link

hrfried commented Feb 19, 2024

Currently just my main desktop which has a ryzen 9 7900x, nvidia 4060-Ti 16-gig, and 64 gigs of DDR5. I've gotten small models to run okay on less powerful hardware but they weren't really super performant. Mostly just using LLMs for productivity and exploring the space, training some LORAs on codebases for work to see what's feasible, etc. Nothing crazy really.

Not super familiar with RAG tbh--but ollama is pretty simple to use. I hadn't used it until today when I saw it was possible to set it as an endpoint in shell_gpt. I normally use other methods (e.g. textgen-webui) for running local LLMS. Just a docker pull and a docker run, honestly.

Don't know a whole lot about "true" cloud computing but I imagine you could run an nginx (or similar) reverse proxy into ollama with a docker-compose workflow and it'd be pretty simple, at least to set up a test-case. Not sure what kind of performance you'd get on shared servers though, especially if it's not GPU compute. Or if you're locked into the cloud provider's networking tools or anything. A little out of my wheelhouse, ha.

@euroblaze
Copy link

euroblaze commented Feb 20, 2024

This repo/Issues should probably not be polluted with OTs, so I'll just conclude here by posting a few pointers to the various topics touched upon.

Currently just my main desktop which has a ryzen 9 7900x, nvidia 4060-Ti 16-gig, and 64 gigs of DDR5.

That looks pretty good!
Unfortunately I'm running on an Macbook Air, business device, and yet have to look into bare-metal options.

Still, took a blind shot at installing the Ollama Docker on a 2x vCPU, 4GB RAM on Debian latest stable.
The install and run was surprisingly smooth (used non-root /home user).
The outputs took forever to generate, like a few seconds per word!
And the first quick-test gave this output (after about an hour of compute, without any tuning/optimisations).

Been researching a few other topics, and here are some pointers for everyone's benefit:

Don't know a whole lot about "true" cloud computing but I imagine you could run an nginx (or similar) reverse proxy into ollama with a docker-compose workflow and it'd be pretty simple, at least to set up a test-case.

I'll probably stay away from cloud-computes, due to the prohibitive costs, especially as we scale towards production.
Tending to get dedicated machines from Hetzner (no affiliation), that recycle their pre-used machines, or shiny new ones.

Thanks @TheR1D and @hrfried for the great software and valuable inputs!

Ashant Chalasani

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to change OPENAI_BASE_URL in .sgptrc
3 participants