Skip to content

AnyModality is an open-source library to simplify MultiModal LLM inference and deployment.

License

Notifications You must be signed in to change notification settings

anymodality/anymodality

Repository files navigation

AnyModality

AnyModality is an open-source library to simplify MultiModal LLM inference and deployment.

Features

Contents

Install

pip install anymodality

Documentation

Full documentation can be found here here.

Please check it out for the most up-to-date tutorials, how-to guides, references, and other resources!

Usage

Call MultiModal LLM Endpoint

Visual Question Answering

For Replicate MiniGPT-4 endpoint:

from anymodality import Task
task = Task("visual_question_answering")
response = task(
    llm="replicate",
    model="daanelson/minigpt-4:b96a2f33cc8e4b0aa23eacfce731b9c41a7d9466d9ed4e167375587b54db9423",
    input={
        "image": open("static/parking.jpg", "rb"),
        "prompt": "It is Wednesday at 4 pm. Can I park at the spot right now? Tell me in 1 line.",
    },
    stream=False,
)
print(response)

For self-hosting SageMaker LLaVA-1.5 endpoint:

from anymodality import Task
task = Task("visual_question_answering")
response = task(
    llm="sagemaker",
    model="huggingface-pytorch-inference-2023-10-29-02-29-37-677",
    input={
        "image": "https://raw.githubusercontent.com/haotian-liu/LLaVA/main/images/llava_logo.png",
        "question": "Describe the image and color details.",
    },
    stream=False,
)
print(response)

Text to Image

Example code can be found at examples/text_to_image.py.

StablityAI

task = Task("text_to_image")
response = task(
    llm="stabilityai",
    model="https://api.stability.ai/v1/generation/stable-diffusion-xl-1024-v1-0/text-to-image",
    input={
        "text_prompts": [{"text": "A lighthouse on a cliff"}],
        "samples": 1,
    },
)
# response: list of image bytes str
from anymodality.tools.image import imgstr_to_PIL
img_pil = imgstr_to_PIL(response[0])
img_pil.show()

OpenAI

task = Task("text_to_image")
response = task(
    llm="openai",
    model="https://api.openai.com/v1/images/generations",
    input={
        "prompt": "A cute baby sea otter",
        "n": 2,
        "size": "1024x1024",
    },
)
# response: list of image urls

Start WebUI for Visual Question Answering

python -m anymodality.tools.webui

You can also parse llm provider and llm model (endpoint) to the webui:

python -m anymodality.tools.webui --llm replicate --model daanelson/minigpt-4:b96a2f33cc8e4b0aa23eacfce731b9c41a7d9466d9ed4e167375587b54db9423

screenshot

Supporting Models

Visual Question Answering

Models Inference Deployment
LLaVA-1.5 Replicate, SageMaker SageMaker
MiniGPT-4 Replicate NA
InstructBLIP Replicate NA
mPLUG-Owl Replicate NA

Text to Image

Models Inference Deployment
DALL·E 2 OpenAI NA
DALL·E 3 NA NA
Stable Diffusion XL StabilityAI, Replicate Huggingface

About

AnyModality is an open-source library to simplify MultiModal LLM inference and deployment.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages