In [None]:
from llama_index import SimpleDirectoryReader, VectorStoreIndex
from llama_index.readers.file.base import (
    DEFAULT_FILE_READER_CLS,
    ImageReader,
)
from llama_index.response.notebook_utils import (
    display_response,
    display_image,
)
from llama_index.indices.query.query_transform.base import (
    ImageOutputQueryTransform,
)

In [None]:
# NOTE: we add filename as metadata for all documents
filename_fn = lambda filename: {"file_name": filename}

# Q&A over Receipt Images

We first ingest our receipt images with the *custom* `image parser` and `metadata function` defined above.   
This gives us `image documents` instead of only text documents.

In [None]:
receipt_reader = SimpleDirectoryReader(
    input_dir="data/receipts",
    file_metadata=filename_fn,
)
receipt_documents = receipt_reader.load_data()

Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration.


We build a simple vector index as usual, but unlike before, our index holds images in addition to text.

In [None]:
receipts_index = VectorStoreIndex.from_documents(receipt_documents)

We can now ask a question that prompts for response with both text and image.  
We use a custom query transform `ImageOutputQueryTransform` to add instruction on how to display the image nicely in the notebook.

In [None]:
from llama_index.query_engine import TransformQueryEngine


query_engine = receipts_index.as_query_engine()
query_engine = TransformQueryEngine(
    query_engine, query_transform=ImageOutputQueryTransform(width=400)
)
receipts_response = query_engine.query(
    "When was the last time I went to McDonald's and how much did I spend?",
)

We now have rich multimodal response with inline text and image!  

The source nodes section gives additional details on retrieved data used for synthesizing the final response.  
In this case, we can verify that the receipt for McDonald's is correctly retrieved. 

In [None]:
display_response(receipts_response)

**`Final Response:`** <img src="data/receipts/1100-receipt.jpg" width="400" />

The last time you went to McDonald's was on 03/10/2018 and you spent $26.15.

# Q & A over LlamaIndex Documentation

We now demo the same for Q&A over LlamaIndex documentations.   
This demo higlights the ability to synthesize multimodal output with a mixture of text and image documents

In [None]:
llama_reader = SimpleDirectoryReader(
    input_dir="./data/llama",
    file_metadata=filename_fn,
)
llama_documents = llama_reader.load_data()

Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration.


In [None]:
llama_index = VectorStoreIndex.from_documents(llama_documents)

In [None]:
from llama_index.query_engine import TransformQueryEngine


query_engine = llama_index.as_query_engine(similarity_top_k=2)
query_engine = TransformQueryEngine(
    query_engine, query_transform=ImageOutputQueryTransform(width=400)
)
llama_response = query_engine.query(
    "Show an image to illustrate how tree index works and explain briefly.",
)

By inspecting the 2 source nodes, we see relevant text and image describing the tree index are retrieved for synthesizing the final multimodal response.

In [None]:
display_response(llama_response)

**`Final Response:`** <img src="data/llama/tree_index.png" width="400" />

This image illustrates how a tree index works. It shows a hierarchical tree structure with a root node at the top and leaf nodes at the bottom. The nodes are connected by branches, which represent the relationships between the nodes. The tree index is a useful way to organize data in a hierarchical structure.

We show another example asking about vector store index instead.

In [None]:
llama_response = query_engine.query(
    "Show an image to illustrate how vector store index works and explain briefly.",
)

In [None]:
display_response(llama_response)

**`Final Response:`** <img src="data/llama/vector_store_index.png" width="400" />

This image illustrates how vector store index works. It stores each Node and its corresponding embedding in a Vector Store. The Nodes are represented by circles and the embeddings are represented by arrows. The arrows point from the Node to the embedding, indicating that the Node is associated with the embedding.