AnyModality

AnyModality is an open-source library to simplify MultiModal LLM inference and deployment.

Features

Supporting MultiModal LLM API providers: OpenAI, StabilityAI, Replicate, Sagemaker...
Supporting MultiModal LLM models: LLaVA-1.5, MiniGPT-4, InstructBLIP...
Supporting tasks: text-to-image, visual-question-answering...

Install

pip install anymodality

Documentation

Full documentation can be found here here.

Please check it out for the most up-to-date tutorials, how-to guides, references, and other resources!

Usage

Call MultiModal LLM Endpoint

Visual Question Answering

For Replicate MiniGPT-4 endpoint:

from anymodality import Task
task = Task("visual_question_answering")
response = task(
    llm="replicate",
    model="daanelson/minigpt-4:b96a2f33cc8e4b0aa23eacfce731b9c41a7d9466d9ed4e167375587b54db9423",
    input={
        "image": open("static/parking.jpg", "rb"),
        "prompt": "It is Wednesday at 4 pm. Can I park at the spot right now? Tell me in 1 line.",
    },
    stream=False,
)
print(response)

For self-hosting SageMaker LLaVA-1.5 endpoint:

from anymodality import Task
task = Task("visual_question_answering")
response = task(
    llm="sagemaker",
    model="huggingface-pytorch-inference-2023-10-29-02-29-37-677",
    input={
        "image": "https://raw.githubusercontent.com/haotian-liu/LLaVA/main/images/llava_logo.png",
        "question": "Describe the image and color details.",
    },
    stream=False,
)
print(response)

Text to Image

Example code can be found at examples/text_to_image.py.

StablityAI

task = Task("text_to_image")
response = task(
    llm="stabilityai",
    model="https://api.stability.ai/v1/generation/stable-diffusion-xl-1024-v1-0/text-to-image",
    input={
        "text_prompts": [{"text": "A lighthouse on a cliff"}],
        "samples": 1,
    },
)
# response: list of image bytes str
from anymodality.tools.image import imgstr_to_PIL
img_pil = imgstr_to_PIL(response[0])
img_pil.show()

OpenAI

task = Task("text_to_image")
response = task(
    llm="openai",
    model="https://api.openai.com/v1/images/generations",
    input={
        "prompt": "A cute baby sea otter",
        "n": 2,
        "size": "1024x1024",
    },
)
# response: list of image urls

Start WebUI for Visual Question Answering

python -m anymodality.tools.webui

You can also parse llm provider and llm model (endpoint) to the webui:

python -m anymodality.tools.webui --llm replicate --model daanelson/minigpt-4:b96a2f33cc8e4b0aa23eacfce731b9c41a7d9466d9ed4e167375587b54db9423

Supporting Models

Visual Question Answering

Models	Inference	Deployment
LLaVA-1.5	Replicate, SageMaker	SageMaker
MiniGPT-4	Replicate	NA
InstructBLIP	Replicate	NA
mPLUG-Owl	Replicate	NA

Text to Image

Models	Inference	Deployment
DALL·E 2	OpenAI	NA
DALL·E 3	NA	NA
Stable Diffusion XL	StabilityAI, Replicate	Huggingface

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.github/workflows		.github/workflows
anymodality		anymodality
docs		docs
examples		examples
static		static
tests		tests
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AnyModality

Features

Contents

Install

Documentation

Usage

Call MultiModal LLM Endpoint

Visual Question Answering

Text to Image

Start WebUI for Visual Question Answering

Supporting Models

Visual Question Answering

Text to Image

About

Releases

Packages

Contributors 2

Languages

License

anymodality/anymodality

Folders and files

Latest commit

History

Repository files navigation

AnyModality

Features

Contents

Install

Documentation

Usage

Call MultiModal LLM Endpoint

Visual Question Answering

Text to Image

Start WebUI for Visual Question Answering

Supporting Models

Visual Question Answering

Text to Image

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages