# 🔍 RAG Demo with LangChain, Ollama, and pgvector (Dockerized)

This notebook demonstrates a minimal Retrieval-Augmented Generation (RAG) pipeline using:

- 🧠 **Ollama** (via `azaddjan/ollama:latest`) to run LLMs and generate embeddings locally
- 🗃️ **pgvector** (via `azaddjan/pgvector:latest`) to store and query document embeddings
- 🔗 **LangChain** to orchestrate embedding, retrieval, and generation

We'll embed a few documents, store them in pgvector, retrieve similar ones for a query, and generate a final answer using `llama3`. Everything runs in Docker for reproducibility.

---

### ▶️ Docker Compose Commands

```bash
# Start the services (Ollama on port 11500, Postgres on 5432)
docker-compose up -d

# Stop everything and remove volumes
docker-compose down -v

In [None]:
# 🛠 Install required packages
!pip install --upgrade pip
!pip install -r requirements.txt

In [48]:
from langchain_community.embeddings import OllamaEmbeddings
from langchain_openai import ChatOpenAI
from langchain.vectorstores.pgvector import PGVector
from langchain.prompts import ChatPromptTemplate
from langchain_core.documents import Document
import psycopg2
import os

# Set dummy API key required for ChatOpenAI
os.environ["OPENAI_API_KEY"] = "ollama"

# Database credentials
DB_USER = "postgres"
DB_PASSWORD = "postgres"
DB_HOST = "localhost"
DB_PORT = "5432"
DB_NAME = "test_rag"

postgres_connection_string = f"postgresql://{DB_USER}:{DB_PASSWORD}@{DB_HOST}:{DB_PORT}/{DB_NAME}"

In [52]:
def create_test_database():
    conn = psycopg2.connect(
        dbname="postgres",
        user=DB_USER,
        password=DB_PASSWORD,
        host=DB_HOST,
        port=DB_PORT
    )
    conn.autocommit = True
    cur = conn.cursor()
    cur.execute(f"SELECT 1 FROM pg_database WHERE datname = '{DB_NAME}';")
    exists = cur.fetchone()
    if not exists:
        cur.execute(f"CREATE DATABASE {DB_NAME};")
        print(f"✅ Database '{DB_NAME}' created.")
    else:
        print(f"ℹ️ Database '{DB_NAME}' already exists.")
    cur.close()
    conn.close()

create_test_database()

ℹ️ Database 'test_rag' already exists.


In [53]:
with psycopg2.connect(postgres_connection_string) as conn:
    with conn.cursor() as cur:
        cur.execute("DROP TABLE IF EXISTS langchain_pg_embedding;")
        conn.commit()

In [54]:
embedding_model = OllamaEmbeddings(
    model="nomic-embed-text:v1.5",
    base_url="http://localhost:11500"
)

texts = [
    "LangChain makes building LLM apps easier and modular.",
    "Ollama runs open-source language models locally using containers.",
]

metadatas = [{"source": "doc1"}, {"source": "doc2"}]

vectorstore = PGVector.from_texts(
    texts=texts,
    embedding=embedding_model,
    metadatas=metadatas,
    collection_name="rag_demo",
    connection_string=postgres_connection_string,
    use_jsonb=True
)

In [55]:
query = "How can I use containers to run AI models locally?"
retrieved_docs = vectorstore.similarity_search(query)
context = "\n".join(doc.page_content for doc in retrieved_docs)

prompt = ChatPromptTemplate.from_template(
    "Use the following context to answer the question:\n\n{context}\n\nQ: {question}\nA:"
)

llm = ChatOpenAI(
    model="llama3:8b",
    openai_api_base="http://localhost:11500/v1",
    openai_api_key="ollama"
)

chain = prompt | llm

response = chain.invoke({
    "context": context,
    "question": query,
})

print("🔍 RAG Answer:\n", response.content)

🔍 RAG Answer:
 You can use containers to run AI models, such as Ollama's open-source language models, locally by leveraging containerization technology. This allows you to isolate the model and its dependencies in a self-contained environment, making it easier to manage and deploy.

Here are some steps to get started:

1. Choose your preferred container runtime, such as Docker, and install it on your local machine.
2. Grab the open-source language models from Ollama or another source, and convert them into a compatible format (e.g., PyTorch or TensorFlow).
3. Create a new container using your chosen runtime, specifying the necessary dependencies and environment variables for the model to run correctly.
4. Deploy the model inside the container by executing the relevant commands (e.g., `python train.py` or `python run.py`).
5. Use container-specific tools (e.g., port mappings or volume mounts) to connect to the running container, allowing you to interact with the model through APIs, shel