# ðŸ§ª Multimodal RAG Chatbot: Research & Experimentation


## 1. Environment & API Setup

In this section, we verify the connection to our cloud providers: **Groq** for LLM inference and **Google AI** for vision/embeddings.

In [1]:
import os
from dotenv import load_dotenv
from groq import Groq

load_dotenv(dotenv_path="../.env")

def test_groq():
    client = Groq(api_key=os.getenv("GROQ_API_KEY"))
    completion = client.chat.completions.create(
        model="llama-3.3-70b-versatile",
        messages=[{"role": "user", "content": "Test response."}]
    )
    return completion.choices[0].message.content

print(f"Groq Status: {test_groq()}")

Groq Status: This is a test response. Everything seems to be working correctly. If you have any questions or need assistance, feel free to ask.


## 2. Embedding Model Logic

The project shifted from Google Gemini Embeddings to local **HuggingFace (`all-MiniLM-L6-v2`)** to ensure reliability and 100% free usage without rate limits.

In [2]:
from langchain_huggingface import HuggingFaceEmbeddings

embeddings_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
sample_text = "This is a test for vector embeddings."
vector = embeddings_model.embed_query(sample_text)

print(f"Vector Dimension Size: {len(vector)}")
print(f"Sample (first 5 values): {vector[:5]}")

  from .autonotebook import tqdm as notebook_tqdm


Vector Dimension Size: 384
Sample (first 5 values): [-0.014902915805578232, -0.04926351085305214, 0.0358581505715847, -0.023814814165234566, 0.08244580030441284]


## 3. Multimodal Analysis (Vision)

Testing how **Llama 3.2 Vision** on Groq can convert complex images/charts from a PDF into searchable text captions.

## 4. Vector Search Strategy

The final step is leveraging **MongoDB Atlas Vector Search**. 

### Why Cosine Similarity?
We use cosine similarity because it measures the *direction* rather than the *magnitude* of vectors, which is ideal for text meaning regardless of document length.