<img src="http://wandb.me/logo-im-png" width="400" alt="Weights & Biases" />
<!--- @wandbcode{python-report-api} -->

In this minimally annotated notebook, I'll build a simple LLM-powered app with Gradio, W&B, and Chroma (and a tiny bit of LangChain). The purpose of this notebook is to show:
- creating a simple Hugging Face transformers pipeline for image-to-text
- how to build a simple Chatbot interface with gradio
- W&B Prompts feature for handling LLMs
- how to use a vector database with Chroma
- creating a simple prompt template with LangChain



# 🔩 Setup

We will use:
- [wandb](https://wandb.ai/site) for logging and tracking
- [transformers](https://huggingface.co/docs/transformers/index) for loading in our model and pipeline
- [gradio](https://www.gradio.app/) for creating the app
- [chromadb](https://docs.trychroma.com/) for storing model generations
- [openai](https://pypi.org/project/openai/) for the embedding function used in our Chroma database
- [tiktoken](https://github.com/openai/tiktoken) for our OpenAI Embedding
- [langchain](https://www.langchain.com/) for a simple prompt template
- [accelerate](https://huggingface.co/docs/accelerate/index) for quantized loading for our HF pipeline
- [bitsandbytes](https://huggingface.co/docs/accelerate/usage_guides/quantization) for quantized loading for our HF pipeline

In [1]:
import subprocess
import os

result = subprocess.run('bash -c "source /etc/network_turbo && env | grep proxy"', shell=True, capture_output=True, text=True)
output = result.stdout
for line in output.splitlines():
    if '=' in line:
        var, value = line.split('=', 1)
        os.environ[var] = value

In [2]:
# !pip install wandb -qqq
# !pip install git+https://github.com/huggingface/transformers -qqq
# !pip install --upgrade gradio -qqq
# !pip install chromadb -qqq
# !pip install openai -qqq
# !pip install tiktoken -qqq
# !pip install langchain -qqq
# !pip install accelerate -qqq
# !pip install bitsandbytes -qqq

^C
[31mERROR: Operation cancelled by user[0m[31m
[0m

In [2]:
import os
import requests
import numpy as np
import torch
import datetime

# For loading in the tiny-LLaVA-v1-hf model in a transformers pipeline.
import transformers
from transformers import pipeline
from transformers import BitsAndBytesConfig

# For converting input images to PIL images.
from PIL import Image

# For creating the gradio app.
import gradio as gr

# For creating a simple prompt (open to extension) to our model.
from langchain.prompts import PromptTemplate

# Our vector database of choice: Chroma!
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings.openai import OpenAIEmbeddings

import chromadb
from chromadb.utils.embedding_functions import OpenCLIPEmbeddingFunction
from chromadb.utils.data_loaders import ImageLoader

# For loading in our OpenAI API key.
# from google.colab import userdata

# For logging.
import wandb
from wandb.sdk.data_types.trace_tree import Trace
from dotenv import load_dotenv
wandb.login()

# Required for us to load in our pipeline for TinyLLaVA.
assert transformers.__version__ >= "4.35.3"

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Currently logged in as: [33m1513563570[0m. Use [1m`wandb login --relogin`[0m to force relogin


# 🧬 Chroma: Vector Database with OpenAI

In [3]:
load_dotenv()

True

In [4]:
# Use OpenAI's embeddings for our Chroma collection.
embeddings = OpenAIEmbeddings(
    model="text-embedding-ada-002",
    openai_api_key=os.getenv("OPENAI_API_KEY"),
)
collection = Chroma("conversation_memory", embeddings)

  warn_deprecated(


# 🧪 Pipeline

In [5]:
# Ref: https://huggingface.co/bczhou/tiny-llava-v1-hf
model_id = "bczhou/tiny-llava-v1-hf"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

pipe = pipeline(
    "image-to-text",
    model=model_id,
    device_map="auto",
    use_fast=True,
    model_kwargs={"quantization_config": bnb_config}
)

The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


# 🚚 Building the App

In [6]:
#@title Optional, pass in a path or URI to an image!
user_avatar_image_path = "" # @param {type:"string"}
chatbot_avatar_image_path = "" # @param {type:"string"}

try:
  assert user_avatar_image_path
except:
  img_data = requests.get("https://imgur.com/QehpHeV.png").content
  with open('user_avatar.png', 'wb') as handler:
      handler.write(img_data)
  user_avatar_image_path = "user_avatar.png"

try:
  assert chatbot_avatar_image_path
except:
  img_data = requests.get("https://imgur.com/ki4hPhZ.png").content
  with open('chatbot_avatar.png', 'wb') as handler:
      handler.write(img_data)
  chatbot_avatar_image_path = "chatbot_avatar.png"

In [7]:
print(user_avatar_image_path, chatbot_avatar_image_path)

user_avatar.png chatbot_avatar.png


In [8]:
# Let's get a sample image to use. You can download it and pass it into the app!
# The prompt is: What's unusual about this image?
img_data = requests.get("https://imgur.com/Ca6gjuf.png").content
with open('sample_image.png', 'wb') as handler:
    handler.write(img_data)

In [9]:
max_new_tokens = 200

# Path for storing images.
IMG_ROOT_PATH = "data/"
os.makedirs(IMG_ROOT_PATH, exist_ok=True)

# Define the function with (message, history) + additional_inputs -> str.
def generate_output(message: str, history: list, img: np.ndarray) -> str:
    """Generates an output given a message and image."""
    status = "success"

    # Get detailed description of the image for Chroma.
    query = "Please provide a detailed description of the image."
    prompt = PromptTemplate.from_template(
        "USER: <image>\n" +
        "{query}" +
        "\n" +
        "ASSISTANT: "
    )

    start_time_ms = datetime.datetime.now().timestamp() * 1000
    try:
        outputs = pipe(Image.fromarray(img), prompt=prompt.format(query=query), generate_kwargs={"max_new_tokens": max_new_tokens})
        img_desc = outputs[0]["generated_text"].split("ASSISTANT:")[-1]
        status_message = (None,)
    except Exception as e:
        status = "error"
        status_message = str(e)
        img_desc = ""
    end_time_ms = round(datetime.datetime.now().timestamp() * 1000)

    # Create a span in wandb.
    root_span = Trace(
        name="img_desc_span",
        kind="llm",  # kind can be "llm", "chain", "agent" or "tool"
        status_code=status,
        status_message=status_message,
        metadata={
            "max_new_tokens": max_new_tokens,
            "model_name": model_id,
        },
        start_time_ms=start_time_ms,
        end_time_ms=end_time_ms,
        inputs={"system_prompt": prompt.format(query=query), "query": query},
        outputs={"response": img_desc},
    )

    # Log the span to wandb.
    root_span.log(name="img_desc_trace")

    # Visual Question-Answering!
    prompt = PromptTemplate.from_template(
        "Context: {context}\n\n"
        "USER: <image>\n" +
        "{message}" +
        "\n" +
        "ASSISTANT: "
    )
    context = collection.similarity_search(query=message, k=2)
    context = "\n".join([doc.page_content for doc in context])

    # Forward pass through the model with given prompt template.
    start_time_ms = datetime.datetime.now().timestamp() * 1000
    try:
        outputs = pipe(
            Image.fromarray(img),
            prompt=prompt.format(
                context=context,
                message=message
            ),
            generate_kwargs={"max_new_tokens": max_new_tokens}
        )
        response = outputs[0]["generated_text"].split("ASSISTANT:")[-1]
        status_message = (None,)
    except Exception as e:
      status = "error"
      status_message = str(e)
      response = ""
    end_time_ms = round(datetime.datetime.now().timestamp() * 1000)

    # Create a span in wandb.
    root_span = Trace(
        name="response_span",
        kind="llm",  # kind can be "llm", "chain", "agent" or "tool"
        status_code=status,
        status_message=status_message,
        metadata={
            "max_new_tokens": max_new_tokens,
            "model_name": model_id,
        },
        start_time_ms=start_time_ms,
        end_time_ms=end_time_ms,
        inputs={
            "system_prompt": prompt.format(
                context=context,
                message=message
            ),
            "query": message
        },
        outputs={"response": response},
    )

    # Log the span to wandb.
    root_span.log(name="response_trace")

    # Add (img_desc, message, response) 3-tuple to Chroma collection.
    text = f"Image Description: {img_desc}\nUSER: {message}\nASSISTANT: {response}\n"
    collection.add_texts(texts=[text])

    # Return model output.
    return img_desc + "\n\n" + response

In [27]:
# wandb.finish()

VBox(children=(Label(value='0.001 MB of 0.014 MB uploaded\r'), FloatProgress(value=0.08326585695006748, max=1.…

In [10]:
wandb.init(project="building_llm_app")

VBox(children=(Label(value='Waiting for wandb.init()...\r'), FloatProgress(value=0.011112101500233014, max=1.0…

In [14]:
# Define the ChatInterface, customize, and launch!
gr.ChatInterface(
    generate_output,
    chatbot=gr.Chatbot(
        label="Chat with me!",
        show_label=True,
        container=False,
        scale=5,
        height=300,
        show_share_button=True,
        show_copy_button=True,
        avatar_images=(user_avatar_image_path, chatbot_avatar_image_path),
        likeable=False,
        layout="bubble",
        bubble_full_width=False
      ),
    textbox=gr.Textbox(
        lines=1,
        max_lines=5,
        placeholder="Message ...",
        container=False,
        scale=7,
        info="Input your textual response in the text field and your image below!"
    ),
    additional_inputs="image",
    additional_inputs_accordion=gr.Accordion(
        open=True,
    ),
    title="Language-Image Question Answering with bczhou/TinyLLaVA-v1-hf!",
    description="""
    This simple gradio app internally uses a Large Language-Vision Model (LLVM) and the Chroma vector database for memory.
    Note: this minimal app requires both an image and a text-based query before the chatbot system can respond.
    """,
    theme="soft",
    submit_btn="Submit ▶",
    retry_btn=None,
    undo_btn="Delete Previous",
    clear_btn="Clear",
).launch(debug=True, share=False, server_port = 6006)



Running on local URL:  http://127.0.0.1:6006

To create a public link, set `share=True` in `launch()`.


Keyboard interruption in main thread... closing server.


In [30]:
wandb.finish()

VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

# 🙏 References

**Setup**
- https://wandb.ai/site
- https://huggingface.co/docs/transformers/index
- https://www.gradio.app/
- https://docs.trychroma.com/
- https://pypi.org/project/openai/
- https://www.langchain.com/
- https://huggingface.co/docs/accelerate/index
- https://huggingface.co/docs/accelerate/usage_guides/quantization

**Chroma: Vector Database with OpenAI**
- https://github.com/chroma-core/chroma
- https://docs.trychroma.com/

**Optional, Multi-Modal Vector Database with Chroma**
- https://docs.trychroma.com/multi-modal?lang=py
- https://github.com/chroma-core/chroma/blob/a370684dd032eaf52ad9619c4811449a52cc1e2c/chromadb/utils/embedding_functions.py#L666
- https://github.com/chroma-core/chroma/blob/a370684dd032eaf52ad9619c4811449a52cc1e2c/chromadb/utils/data_loaders.py#L9
- https://github.com/chroma-core/chroma/blob/a370684dd032eaf52ad9619c4811449a52cc1e2c/chromadb/api/client.py#L115
- https://github.com/chroma-core/chroma/blob/a370684dd032eaf52ad9619c4811449a52cc1e2c/chromadb/api/client.py#L188


**Pipeline**
- https://huggingface.co/bczhou/tiny-llava-v1-hf
