AnyModality is an open-source library to simplify MultiModal LLM inference and deployment.
- Supporting MultiModal LLM API providers: OpenAI, StabilityAI, Replicate, Sagemaker...
- Supporting MultiModal LLM models: LLaVA-1.5, MiniGPT-4, InstructBLIP...
- Supporting tasks:
text-to-image
,visual-question-answering
...
pip install anymodality
Full documentation can be found here here.
Please check it out for the most up-to-date tutorials, how-to guides, references, and other resources!
For Replicate MiniGPT-4 endpoint:
from anymodality import Task
task = Task("visual_question_answering")
response = task(
llm="replicate",
model="daanelson/minigpt-4:b96a2f33cc8e4b0aa23eacfce731b9c41a7d9466d9ed4e167375587b54db9423",
input={
"image": open("static/parking.jpg", "rb"),
"prompt": "It is Wednesday at 4 pm. Can I park at the spot right now? Tell me in 1 line.",
},
stream=False,
)
print(response)
For self-hosting SageMaker LLaVA-1.5 endpoint:
from anymodality import Task
task = Task("visual_question_answering")
response = task(
llm="sagemaker",
model="huggingface-pytorch-inference-2023-10-29-02-29-37-677",
input={
"image": "https://raw.githubusercontent.com/haotian-liu/LLaVA/main/images/llava_logo.png",
"question": "Describe the image and color details.",
},
stream=False,
)
print(response)
Example code can be found at examples/text_to_image.py.
StablityAI
task = Task("text_to_image")
response = task(
llm="stabilityai",
model="https://api.stability.ai/v1/generation/stable-diffusion-xl-1024-v1-0/text-to-image",
input={
"text_prompts": [{"text": "A lighthouse on a cliff"}],
"samples": 1,
},
)
# response: list of image bytes str
from anymodality.tools.image import imgstr_to_PIL
img_pil = imgstr_to_PIL(response[0])
img_pil.show()
OpenAI
task = Task("text_to_image")
response = task(
llm="openai",
model="https://api.openai.com/v1/images/generations",
input={
"prompt": "A cute baby sea otter",
"n": 2,
"size": "1024x1024",
},
)
# response: list of image urls
python -m anymodality.tools.webui
You can also parse llm provider and llm model (endpoint) to the webui:
python -m anymodality.tools.webui --llm replicate --model daanelson/minigpt-4:b96a2f33cc8e4b0aa23eacfce731b9c41a7d9466d9ed4e167375587b54db9423
Models | Inference | Deployment |
---|---|---|
LLaVA-1.5 | Replicate, SageMaker | SageMaker |
MiniGPT-4 | Replicate | NA |
InstructBLIP | Replicate | NA |
mPLUG-Owl | Replicate | NA |
Models | Inference | Deployment |
---|---|---|
DALL·E 2 | OpenAI | NA |
DALL·E 3 | NA | NA |
Stable Diffusion XL | StabilityAI, Replicate | Huggingface |