-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Find ways for sustainable operation of OA inference (call for brainstorming) #2806
Comments
Note: help with more efficient inference is welcome, but this issue is about sustainability |
Make it easy for individuals to donate one off within the app, similar to Wikipedia. Obviously not going to be nearly enough to cover costs but maybe could help and at least be a good norm to try nudge user towards. |
Also, dare I say it, some ad's in the app, in a nice way similar to how readthedocs does it https://docs.readthedocs.io/en/stable/advertising/ethical-advertising.html |
i saw this yesterday https://github.com/zilliztech/GPTCache. Perhaps we could use a smaller model to then "personalize" after we get a cache hit (im not sure if gptcache already does that). |
Why not have the same model as OpenAI? As a not-very-technical developer, I would be happy to pay per usage. It may be costly to implement, but if there's demand for it, it may be profitable in the long run. Here are the steps I'd take: Is there someone on the team able to execute this, or might you have executive support for it? if you want my help with this, contact me at david@transparent.services |
I'd be against charging directly or using crypto-tokens (at least initially) since they have the stench of pyramid schemes and rug-pulls. I think it would be better for people to contribute in ways that serve the community, we just need to make that easy and verifiable;
If crypto-tokens need to be the way forward, use something that's directly pegged to the dollar like DAI, rather than something that invites speculative bubbles and the endless reaming butthurt that is associated with these things. |
In my opinion, there are two different things: the project that generates the software, the data and the models, and a hosted chat service. The first one can progress with only passionate collaborators, but the second one needs a company or a foundation that handles the economics, unless it is based on some kind of peer-to-peer network, which could only provide a best-effort service. I was not expecting Open Assistant to be a hosted chat service, but the base to self-host or to give others the possibility to create hosted services with the software and models provided by Open Assistant. Getting into the serving business is a daunting task that requires a business structure and an operations team. I think that going into the serving field is not sustainable (unless the company/foundation route is desired): if the service works well, many more people will be using it, and the problem will become bigger. So, I think that a route forward is not to market Open Assistant as a ChatGPT service alternative, but as an open source alternative to the code and the models. A demo service could be provided, but users could be educated about the nature of the service so that they understand its limitations and don't have unrealistic expectations. In addition, the hosted chat service is already taking precious time from collaborators that could work in other parts of the system. My impression was that bringing models to consumer hardware was a more important goal of the Open Assistant project than working as a chat service. This is just my opinion. I'm not opposed to any path forward, and I'm amazed and very grateful for the great achievements of the collaborators that I see in this project. |
efficiency ⊆ sustainability ⊆ efficiency. here is the ideal/design-architect. |
Thanks everyone for your valuable input. I propose to focus on human-teaching (feedback collection) & building a highly-efficient model-factory instead of a scalable inference chat-hosting service. If someone wants to access our models via API they simply should do this via the HuggingFace inference system (or potentially offerings by other providers). |
I agree with GuilleHoardings. I think the focus should be on the software, data, and models. Not on hosting. The biggest value of OA is letting small businesses, individuals, and institutions to keep control of their technology stack. You can do this without hosting. Another option would be to allow people to register Compute Agents that could run on local hardware. A user could register an agent within their OA account giving them access to compute. You could introduce a community tax which would allow other people to use other people’s agents when not in use (capping it at some reasonable threshold of GPU cycles). In all honesty though, this second option sounds more like a business than a open source project. |
Do you not sort of need the "chat-hosting service" to be able to really scale the "human-teaching (feedback collection)" stuff - or do you mean lower volume but higher quality or more focused/targeted? I could imagine the really useful data ending up being the annotated chats themselves - to get the flywheel going to be able to continually improve the models. If someone eg huggingface for example puts their own frontend on the OA models then the chats and thumbs data to maybe keep improving them ends up with them so wondering if there is something very useful about some form of self-hosting still being useful in terms of the data collection. eg for a long time i used to actually feed and use https://movielens.org/ for my recommendations, explicitly because my thumbs data was going into open research and making their datasets and models better. I sort of feel the same way about using https://open-assistant.io/chat |
An inference system is needed, but we could focus more on "collective conversations", e.g. collecting feedback to responses generated to prompts submitted by other users. Third party compute providers (like StabilityAI or Huggingface) could host the helpful consumer-facing assistant. We could also add a "oa-credits" system that would allow using the assistant or our API which could be earned for feedback. Inference for a teaching system probably would cost < 150k USD pre year, e.g. like running a single 8 GPU pod .. while operating large scale general purpose inference would cost several millions. The mission of OA should be to provide the best open-source models and human feedback data and maybe a general purpose quickly deployable inference system (without mandatory human feedback collection) that could be used by others to host the OA models. |
Thanks everyone for your input. |
I highly suggest a free base model that has restrictions and then a minial monthly fee for "unlimited" use like MJ says, or in other words, high use allotments that can be slowed to retain gpu hours. I also, for one, would bs absolutely down to do a monthly fee for special festures, like maybe some plugins like bot traders cost to use? Just an idea. Love you all, dearly for youe hard work. Once I get pain ill be sending some skrill your way 😅💙 |
I think we could maybe expose the cost to the user and mostly pass the payment through to the cloud providers. button: chat with openassitant (5 free inferences) button: "launch openassistant on aws (affiliate, 0.10 $ to openassistant) total: 2.00 /hour" button: "launch openassistant on huggingface (affiliate, 0.10 $ to openassistant) total: 2.00 /hour" button: "use sponsored api, free, stability.ai" button: "use volunteer instance, free" button: "selfhost, always free" |
I mean jeez this is perfect. all of these should be implemented. |
In order to provide inference online at open-assistant.io/chat/ for a longer period of time we need to find a sustainable solution that covers the high costs of operation. This issue is a call to gather ideas and opinions how OA could tackle this challenge.
Background: For launch we have (up to now) through our extremely generous and supportive sponsors StabilityAI & Huggingface a larger number of A100 80 GB GPUs on which our compute inference runs (brrrrr). Each of these GPUs currently can serve only two requests at a time with a float16 30B model, i.e. while a user sees text streamed to her/him on the other side a A100 GPU is 50% occupied. We haven't calculated the exact costs per chat-request yet, but it is obvious that renting the necessary GPUs from popular GPU cloud providers would cost a lot (e.g. lambda labs offers a 8x A100 80GB for 12 USD/h - availability of these pods is another issue).
Here are three first ideas which Yannic and I had in a private discussion:
If you have further ideas for sustainable operation or cost reduction or you could help in organizing a partner network or token-based solution please comment here or contact us on the OA discord server.
The text was updated successfully, but these errors were encountered: