1.0.3: The Vision Update
OpenArc 1.0.3: The Vision Update
New features
- Vision support (!)
- OpenArc takes a dynamic approach to how images are processed
- Recieved messages are checked for base64 and are passed to the appropriate tokenization method, enabling text to text as well as image to text in the same chat/input
- There are no normalization steps for images. We don't shrink to 100dpi, or apply a zoom, or anything like that; bring your own logic for preprocessing.
- stream=false is not supported yet
- Load multiple models at once on different devices with the "Model Manager" tab. unload models from here as well.
- Added model metadata. Now loaded models store data about how they were loaded; we use this throughout inference, track models in memory across devices. You can now
- Load both vision and text models into memory
- Be careful though, we dont have any safety measure in place. Usually these situations would create a stalled load up or some memory error
- For those with multiple GPUs, you could run multiple
- Added model_type field. When loading a model you now specify either TEXT or VISION; this routes requests to the appropriate class and will be extended to other architectures/tasks in the future
- Updated model conversion tool to latest which has a ton of experimental datatypes/quant types.
- Dashboard has been refactored and is less of a mess
- And many more changes to the codebase that communicate project direction.
Issues
- Right now gemma3 has specific requirements for inference. We are working out the right set of parameters to load with and it needs better documentation
- Inference doesn't usually fail gracefully i.e, needs better handling of threading so the API doesn't become inaccessible and crash when a thread fails for whatever reason
- Concurrent requests to multiple loaded models is not yet implemented, and we don't have queuing yet
- There are probably other things so report what you encounter in the discord or on github but for the "vision"