You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
AnyGPT is quite a promising project released 2 months before GPT4o.
It is a versatile multimodal LLaMA-based model, which is able not only to take images as an input, but also non-transcribed speech (for example, for cloning), music. And the output is also speech, images and music in the tokens form, what are fed into (inplicitly-represented, e.g. UnCLIP instead of prompts for StableDiffusion) specialized models to generate the outputs.
I think such concept can improve the o-like experience, although it may require to adjust the encoder/decoder backends to make the generation faster.
I'm not sure how those modalities would be useful though. For example, it could generate music, or just as easily, have function calling and pick a track on Spotify. Same for the image generation.
As I get, it's still quite a general purpose model and function-calling should work with it as well (maybe some tuning, if they overwrote normal instructions too much)
The best thing is that the model acquires better semantic understanding of things (how words, sounds, music and images connect with each other), so it may be worth exploring it's "creative soul" :)
Decoders like StableDiffusion can work very fast now on proper GPUs, if you use things like LCMs and tricks allowing for few/one step diffusion
AnyGPT is quite a promising project released 2 months before GPT4o.
It is a versatile multimodal LLaMA-based model, which is able not only to take images as an input, but also non-transcribed speech (for example, for cloning), music. And the output is also speech, images and music in the tokens form, what are fed into (inplicitly-represented, e.g. UnCLIP instead of prompts for StableDiffusion) specialized models to generate the outputs.
I think such concept can improve the o-like experience, although it may require to adjust the encoder/decoder backends to make the generation faster.
See the project page https://junzhan2000.github.io/AnyGPT.github.io/
https://github.com/OpenMOSS/AnyGPT
P.S. I think it would be a much better addition, than just giving it vision via the legacy llava
GlaDOS/README.md
Line 14 in 9d6bc9a
The text was updated successfully, but these errors were encountered: