# Text + Image Multimodal Pipeline

## Setups

In [7]:
import requests
import json
import time
import pandas as pd

BASE_URL = "http://localhost:8000"


In [8]:
def pp(x):
    print(json.dumps(x, indent=2, ensure_ascii=False))


## Health Check

In [9]:
res = requests.get(f"{BASE_URL}/health/health")
print(res.status_code, res.json())


200 {'status': 'ok'}


In [None]:
# from qdrant_client import QdrantClient
# q = QdrantClient(host="localhost", port=6333)
# q.delete_collection("agentforge_embeddings")

False

## TEXT PIPELINE – Ingestion + RAG

In [None]:
with open("examples/rohan.txt", "rb") as f:
    res = requests.post(
        f"{BASE_URL}/ingest/ingest",
        files={"file": ("rohan.txt", f, "text/plain")}
    )

text_ingest = res.json()
pp(text_ingest)


{
  "doc_id": "f0ffdd93-8641-4270-8ebd-72d9e4350a61",
  "type": "text",
  "chunks": 2,
  "source": "text"
}


In [11]:
payload = {
    "question": "What does the sample document talk about?",
    "top_k": 5
}

res = requests.post(f"{BASE_URL}/query/", json=payload)
text_rag = res.json()
pp(text_rag)


{
  "answer": "The sample document talks about Rohan's journey as he built his skills in AI, data science, and cloud work through various internships and personal projects. It highlights how he aimed to build systems that could automate everyday work and remove slow steps, and how he envisioned a future where smart automation supports every field.",
  "context": [
    "He wanted to build systems that could remove slow steps from everyday work. He planned a one-person business that could earn steady monthly income by using AI tools to automate real workflows. He also began shaping a full AI automation agency with an idea for a flexible “Inbox-to-Action Bot” that could support many types of clients. Even with busy days, he kept improving his English through JAM practice. He also launched Instagram pages around startups and AI, hoping to share knowledge and inspire others. Rohan’s story shows a person who moves forward with intent. He learns fast, adapts quickly, and builds tools that hel

In [6]:
questions = [
    'Who is the person the story follows?',
    'What skills does Rohan learn during his journey?',
    'What work did he do at Smollan?',
    'How did he help the art team at MBS Studio?',
    'What project did he build that earned money?',
    'How does Rohan use automation in his plans?',
    'What future goal does he want to reach with AI tools?',
    'How does his story show his growth over time?'
]

text_runs = []

for q in questions:
    t0 = time.perf_counter()
    res = requests.post(f"{BASE_URL}/query/", json={"question": q, "top_k": 5})
    t1 = time.perf_counter()
    
    data = res.json()
    
    text_runs.append({
        "question": q,
        "answer": data["answer"],                   
        "answer_len": len(data["answer"].split()),
        "latency_sec": round(t1 - t0, 4),
        "sim_avg": data["metrics"]["similarity_stats"]["avg_score"],
        "rouge_avg": data["metrics"]["rouge_stats"]["avg_rouge"],
        "cost_usd": data["metrics"]["cost_info"]["estimated_cost_usd"],
    })

df_text = pd.DataFrame(text_runs)
df_text


Unnamed: 0,question,answer,answer_len,latency_sec,sim_avg,rouge_avg,cost_usd
0,Who is the person the story follows?,The person the story follows is Rohan.,7,7.6763,0.447896,0.053791,5.6e-05
1,What skills does Rohan learn during his journey?,"Based on Rohan's journey, here are the skills ...",163,26.3094,0.70512,0.024035,0.000131
2,What work did he do at Smollan?,"At Smollan, Rohan worked on a business analyti...",20,15.0785,0.57233,0.030276,0.000117
3,How did he help the art team at MBS Studio?,"At MBS Studio, Rohan helped artists generate b...",13,14.9716,0.569951,0.048058,0.000116
4,What project did he build that earned money?,"The project that earned money was a ""book-gene...",9,15.1607,0.547395,0.041963,0.000116
5,How does Rohan use automation in his plans?,Rohan uses automation by building systems that...,103,17.4203,0.727412,0.041919,8.9e-05
6,What future goal does he want to reach with AI...,The future goal he wants to reach with AI tool...,61,14.4358,0.718705,0.076189,8.5e-05
7,How does his story show his growth over time?,Rohan's story shows his growth over time in se...,200,31.0341,0.514125,0.032845,0.000135


In [7]:
for row in text_runs:
    print(f"Question: {row['question']}\n")
    print(f"Answer:\n{row['answer']}\n")
    print("=" * 80, "\n")


Question: Who is the person the story follows?

Answer:
The person the story follows is Rohan.


Question: What skills does Rohan learn during his journey?

Answer:
Based on Rohan's journey, here are the skills he learned:

1. **AI**: How to shape models, test systems, and use AI prompts for various applications (e.g., generating concept art).
2. **Data Science**: Improved model accuracy in his work at Smollan.
3. **Cloud**: Built a web app at Whatsbuild that connected with people in real time.
4. **Full-Stack Work**: Built multiple projects on his own, including a behaviour-tracking tool and a chatbot.
5. **Automation**: Found a strong interest in building systems that remove slow steps from everyday work.

Additionally, Rohan also developed:

1. **Project Management**: Successfully completed multiple personal projects, each of which pushed him to learn something new.
2. **Communication Skills**: Improved his English through JAM practice and launched Instagram pages to share knowledge

## Image Pipelines - Ingestion (Caption + Embed)

In [8]:
image_files = [
    "examples/photo_1.jpeg",        
    "examples/complex_data_v.jpg"    
]

img_ingests = []

for path in image_files:
    with open(path, "rb") as f:
        res = requests.post(
            f"{BASE_URL}/ingest/ingest",
            files={"file": (path.split("/")[-1], f, "image/jpg")}
        )
    info = res.json()
    print(path, "→", info)
    img_ingests.append(info)


examples/photo_1.jpeg → {'doc_id': 'c73d2bc8-5edb-4a88-a5b7-4a1b43673d3b', 'type': 'image', 'caption': 'A group of people posing for a photo at an event, with some holding awards and plaques.'}
examples/complex_data_v.jpg → {'doc_id': '0b9dbccc-8548-4a3e-8fbc-f3cdd306da5f', 'type': 'image', 'caption': 'The image shows a map with various lines connecting different locations, suggesting a network of connections or travel routes, displayed in blue and red on a white background.'}


## Test RAG over images only

In [9]:
payload = {
    "question": "What kind of event is shown on the stage photo?",
    "top_k": 5
}

res = requests.post(f"{BASE_URL}/query/", json=payload)
img_rag = res.json()
pp(img_rag["answer"])
pp(img_rag["results"][:2])


"Based on the image of people seated and standing on a stage, possibly receiving awards or recognitions for their achievements, I would say that the event shown on the stage photo is likely an **Award Ceremony**."
[
  {
    "score": 0.72644967,
    "rougeL": 0.23529411764705882,
    "doc_id": "08904513-7a0e-432e-b291-e186f685c2ea",
    "chunk_id": "d272d0e2-568a-483b-9fc8-614a63db5b53",
    "text": "This is a photograph of an event where people are seated and standing on a stage, possibly receiving awards or recognitions for their achievements.",
    "metadata": {
      "doc_id": "08904513-7a0e-432e-b291-e186f685c2ea",
      "type": "image",
      "source": "caption",
      "text": "This is a photograph of an event where people are seated and standing on a stage, possibly receiving awards or recognitions for their achievements."
    }
  },
  {
    "score": 0.62595356,
    "rougeL": 0.14814814814814817,
    "doc_id": "c73d2bc8-5edb-4a88-a5b7-4a1b43673d3b",
    "chunk_id": "67be8c1b-eb13

## MultiModal - Image+ Text query

In [10]:
with open("examples/complex_data_v.jpg", "rb") as f:
    res_mm = requests.post(
        f"{BASE_URL}/multimodal/",
        files={"file": ("complex_data_v.jpg", f, "image/jpg")},
        data={
            "query": "What story does this data visualization tell? Are there any outliers that need attention?"
        }
    )

img_mm = res_mm.json()
pp(img_mm)


{
  "final_answer": "To provide a comprehensive answer, I'll combine the information from the data visualization (image) and the retrieved knowledge (RAG). **Step 1: Analyze the Data Visualization** The image is a map-like diagram with numerous lines connecting various locations. This suggests that the network diagram represents travel or connections between different entities or concepts such as cities, locations, and types of travel. **Step 2: Identify Outliers in the Network Diagram** Upon closer inspection, I notice that there are several nodes (representing locations) with a high degree of connectivity (many lines connecting them). These nodes appear to be hubs or central points in the network. One such node is labeled \"University of California\" which suggests it might be an important location for research or collaboration. **Step 3: Integrate RAG Context** The top-ranked RAG response (score: 0.70467246) describes the image as a colorful network diagram illustrating complex rela

In [11]:
print("IMAGE CAPTION:\n", img_mm["image_caption"], "\n")
print("FINAL ANSWER (trimmed):\n", img_mm["final_answer"][:700], "...")


IMAGE CAPTION:
 The image is a map-like diagram with numerous lines connecting various locations, indicating travel or connections between them. 

FINAL ANSWER (trimmed):
 To provide a comprehensive answer, I'll combine the information from the data visualization (image) and the retrieved knowledge (RAG). **Step 1: Analyze the Data Visualization** The image is a map-like diagram with numerous lines connecting various locations. This suggests that the network diagram represents travel or connections between different entities or concepts such as cities, locations, and types of travel. **Step 2: Identify Outliers in the Network Diagram** Upon closer inspection, I notice that there are several nodes (representing locations) with a high degree of connectivity (many lines connecting them). These nodes appear to be hubs or central points in the network. One such nod ...


## MultiModal Metrics

In [12]:
mm_runs = []

mm_runs.append({
    "query": "What story does this data visualization tell?",
    
    "answer_len": len(img_mm["final_answer"].split()),
    
    "sim_avg": img_mm["rag_metrics"]["similarity_stats"]["avg_score"],
    "rouge_avg": img_mm["rag_metrics"]["rouge_stats"]["avg_rouge"],
    "hit_rate": img_mm["rag_metrics"]["hit_rate"],
})

df_mm = pd.DataFrame(mm_runs)
df_mm


Unnamed: 0,query,answer_len,sim_avg,rouge_avg,hit_rate
0,What story does this data visualization tell?,532,0.591013,0.185226,0.6


## Latency Summary

In [13]:
import re

lat_lines = []

with open("logs/latency.log", "r") as f:
    for line in f:
        # pattern: [NAME] took X seconds
        m = re.search(r"\[(.+?)\] took ([0-9.]+) seconds", line)
        if m:
            lat_lines.append({
                "component": m.group(1),
                "latency_sec": float(m.group(2))
            })

df_lat = pd.DataFrame(lat_lines)
df_lat


Unnamed: 0,component,latency_sec
0,RAG Search,0.2047
1,LLaMA RAG Inference,0.0575
2,RAG Search,0.1375
3,LLaMA RAG Inference,0.0297
4,RAG Search,0.1583
...,...,...
171,LLaMA RAG Inference,15.9350
172,LLaVA Captioning,20.2958
173,RAG Search,0.1574
174,LLaMA Inference,98.5735


In [14]:
df_lat.groupby("component")["latency_sec"].mean().round(4)


component
LLaMA Inference           24.4577
LLaMA RAG Inference        8.9487
LLaVA Captioning          14.6321
Multimodal Query          83.3058
RAG Search                 0.1636
Whisper Transcription    100.0632
Name: latency_sec, dtype: float64

## Cost Summary

In [15]:
df_text[["question", "cost_usd"]]
print("Total text RAG cost (simulated):", df_text["cost_usd"].sum())

Total text RAG cost (simulated): 0.0008450999999999999


In [16]:
summary = {
    "text_ingestion_ok": len(df_text) > 0,
    "image_ingestion_ok": len(img_ingests) > 0,
    "rag_answer_nonempty": all(df_text["answer_len"] > 0),
    "multimodal_answer_nonempty": len(img_mm["final_answer"].strip()) > 0,
}

summary


{'text_ingestion_ok': True,
 'image_ingestion_ok': True,
 'rag_answer_nonempty': True,
 'multimodal_answer_nonempty': True}

In [18]:
with open("examples/Work_Division_Student_Hub_Project.pdf", "rb") as f:
    res = requests.post(
        f"{BASE_URL}/ingest/ingest",
        files={"file": ("Work_Division_Student_Hub_Project.pdf", f, "application/pdf")}
    )

print(res.json())


{'doc_id': 'f89dfd48-c0dd-4819-bdad-cc89d52b6573', 'type': 'text', 'chunks': 2, 'source': 'pdf → text (in memory)'}


In [19]:
payload = {
    "question": "What does the sample document talk about?",
    "top_k": 5
}

res = requests.post(f"{BASE_URL}/query/", json=payload)
text_rag = res.json()
pp(text_rag)


{
  "answer": "The sample document talks about an event where people are receiving awards or recognitions for their achievements. However, there is no direct mention of what the event is specifically honoring. The context suggests it may be related to some kind of achievement or accomplishment in a field like academia, technology, or travel.",
  "context": [
    "2 APIs for revision question generation and text summarization. 3 Documentation of AI pipeline and model usage. Member 4 – Project Manager & Integration/Testing Engineer Responsibilities: 1 Oversee team coordination, progress tracking, and documentation. 2 Handle system integration between frontend, backend, and AI modules. 3 Perform unit, integration, and usability testing. 4 Prepare project documentation (Problem statement, SRS, design diagrams, test report, and final report). 5 Manage deployment (GitHub Pages, Firebase, or Render). Deliverables: 1 Working integrated system. 2 Complete project documentation and final present