Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local API or Gradio Client Support focus. #3

Closed
waefrebeorn opened this issue Mar 13, 2024 · 6 comments
Closed

Local API or Gradio Client Support focus. #3

waefrebeorn opened this issue Mar 13, 2024 · 6 comments

Comments

@waefrebeorn
Copy link

Gradio clients that run local language models such as “OobaBooga” and allow api support should be a major consideration for the roadmap process. Creating usable model swapping with a cache functionality is feasible. I made an example chart months ago when I saw the potential in MinP greedy sampling that Kalomaze did work on being helpful for memory driven tasked recall due to the token accuracy.
image

Please note that current projects like MemoryGPT allow api usage but no widespread application allows for effective model swapping or multi system offloading. It’s also important to note that a side server “chain” of cheaper machines or a GGML focused network solution could allow for more garage labs.

Current Roadblocks are memory management, non-useful hallucinations (effective hallucinations could generate better idea tokens in a agent focus), and ineffective inter model conversation solutions that are actually open source for System prompting style implementation.

The most feasible multi model solution is to allow for most elements to be cpu offloaded but for features like live training a model with a model doing RLHF being a “drop in” use that requires a GPU with enough vram for training. Unless a Traditional ram based training solution is usable with current model base such as mistral.

To summarize, a focus on using API solutions such as chatgpt or Claude will stagnate research on local language model feasibility. Creating a feasible framework for agent structures and Lora based live tuning for memory retention elements on a version based task list will most likely be the best course.

@waefrebeorn
Copy link
Author

Please note that my picture example is of a Call Center agent system I designed in October 2023 that ended up not being used. The designed structure is a feasible alternative to a decision management system managed by a central query system for each “console” or emulated agent. Measuring the amount of cluster agents in the loop is my preposed measurement of scale for the complexity of the task, with the central query system being the “database model” that is consistently improved upon with base model usage being swapped out and a Lora imprint system for creating the “readiness” for being in the system with minimal overhead.

@braveokafor
Copy link

Hi @emangamer ,

How does a project like Ollama hold up for this use case?

@waefrebeorn
Copy link
Author

Hi @emangamer ,

How does a project like Ollama hold up for this use case?

Ollama has a REST API for running and managing models.

You'd need a different project for training models, this looks to be a simple chat interface with prompt commands.

Gradio based projects have shown a marked standard in the AI space and the versatile nature of the web environment allows things like docker based google collab use, greatly increasing availability for phone users as well. As it was used it RVC voice synthesis.

@braveokafor
Copy link

Got it, I'll look into Gradio.

@waefrebeorn
Copy link
Author

Got it, I'll look into Gradio.

If you're looking for a user client that uses Gradio, I suggest OobaBooga, Gradio is an open source webUI front end, not a AI model service. Open Devin should have the interface in Gradio.

@huybery
Copy link
Member

huybery commented Mar 27, 2024

@emangamer We're currently aiming for rapid prototyping (and won't consider using a complex framework for now), so feel free to discuss future architectural options with us at slack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants