Release 1.0.3: The Vision Update · SearchSavior/OpenArc

OpenArc 1.0.3: The Vision Update

New features

Vision support (!)
- OpenArc takes a dynamic approach to how images are processed
- Recieved messages are checked for base64 and are passed to the appropriate tokenization method, enabling text to text as well as image to text in the same chat/input
- There are no normalization steps for images. We don't shrink to 100dpi, or apply a zoom, or anything like that; bring your own logic for preprocessing.
- stream=false is not supported yet
Load multiple models at once on different devices with the "Model Manager" tab. unload models from here as well.
Added model metadata. Now loaded models store data about how they were loaded; we use this throughout inference, track models in memory across devices. You can now
- Load both vision and text models into memory
- Be careful though, we dont have any safety measure in place. Usually these situations would create a stalled load up or some memory error
- For those with multiple GPUs, you could run multiple
Added model_type field. When loading a model you now specify either TEXT or VISION; this routes requests to the appropriate class and will be extended to other architectures/tasks in the future
Updated model conversion tool to latest which has a ton of experimental datatypes/quant types.
Dashboard has been refactored and is less of a mess
And many more changes to the codebase that communicate project direction.

Right now gemma3 has specific requirements for inference. We are working out the right set of parameters to load with and it needs better documentation
Inference doesn't usually fail gracefully i.e, needs better handling of threading so the API doesn't become inaccessible and crash when a thread fails for whatever reason
Concurrent requests to multiple loaded models is not yet implemented, and we don't have queuing yet
There are probably other things so report what you encounter in the discord or on github but for the "vision"