# ResearchHub Infrastructure Health Check

This notebook verifies that all Docker services are running and healthy.

**Prerequisites:** Docker Desktop must be running and services started with:
```bash
docker compose up -d
```

## Services Checked
| # | Service | Port | Purpose |
|---|---------|------|---------|
| 1 | PostgreSQL | 5432 | Main application database |
| 2 | Redis | 6379 | Application cache |
| 3 | OpenSearch | 9200 | Vector database for RAG |
| 4 | OpenSearch Dashboards | 5601 | OpenSearch web UI |
| 5 | Ollama | 11434 | Local LLM inference |
| 6 | FastAPI Backend | 8000 | Application API |
| 7 | Frontend (Nginx) | 3000 | React web app |
| 8 | Airflow | 8080 | Pipeline orchestration |
| 9 | Langfuse Web | 3001 | LLM observability UI |
| 10 | Langfuse Postgres | 5433 | Langfuse database |
| 11 | Langfuse Redis | 6380 | Langfuse job queue |
| 12 | MinIO | 9090 | S3-compatible storage |
| 13 | MinIO Console | 9091 | MinIO web UI |
| 14 | ClickHouse | - | Analytics DB (internal only) |

## Setup: Install Required Packages

In [1]:
# Dependencies check — works with uv-managed environments
import sys
import subprocess

subprocess.run(
    [sys.executable, '-m', 'pip', 'install', 'requests', 'psycopg2-binary', 'redis', 'opensearch-py'],
    capture_output=True
)
print('Dependencies ready.')

Dependencies ready.


## Health Check Framework

In [2]:
import requests
import socket
import time
from datetime import datetime

# ─── Result tracking ───────────────────────────────────────────────────────────
results = []

def check(name: str, status: bool, detail: str = "", warning: str = ""):
    """Record a health check result and print it immediately."""
    icon = "✅" if status else "❌"
    warn = f"  ⚠️  {warning}" if warning else ""
    print(f"{icon} {name:<35} {detail}{warn}")
    results.append({"service": name, "healthy": status, "detail": detail})

def http_get(url: str, timeout: int = 5) -> tuple[bool, str]:
    """Try an HTTP GET and return (success, detail)."""
    try:
        r = requests.get(url, timeout=timeout)
        return r.status_code < 500, f"HTTP {r.status_code}"
    except requests.exceptions.ConnectionError:
        return False, "Connection refused"
    except requests.exceptions.Timeout:
        return False, "Timeout"
    except Exception as e:
        return False, str(e)

def port_open(host: str, port: int, timeout: int = 3) -> tuple[bool, str]:
    """Check if a TCP port is open."""
    try:
        with socket.create_connection((host, port), timeout=timeout):
            return True, f"Port {port} open"
    except (ConnectionRefusedError, socket.timeout):
        return False, f"Port {port} closed/unreachable"
    except Exception as e:
        return False, str(e)

print(f"Health check started at {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print("-" * 70)

Health check started at 2026-02-19 08:58:31
----------------------------------------------------------------------


## 1. PostgreSQL

In [3]:
import psycopg2

try:
    conn = psycopg2.connect(
        host="localhost",
        port=5432,
        dbname="rag_db",
        user="rag_user",
        password="rag_password",
        connect_timeout=5
    )
    cur = conn.cursor()
    cur.execute("SELECT version();")
    version = cur.fetchone()[0].split(',')[0]  # e.g. "PostgreSQL 16.x"
    cur.execute("SELECT count(*) FROM information_schema.tables WHERE table_schema = 'public';")
    table_count = cur.fetchone()[0]
    conn.close()
    check("PostgreSQL (app DB)", True, f"{version} | {table_count} public tables")
except Exception as e:
    check("PostgreSQL (app DB)", False, str(e))

✅ PostgreSQL (app DB)                 PostgreSQL 16.12 on aarch64-unknown-linux-musl | 0 public tables


## 2. Redis

In [4]:
import redis as redis_lib

try:
    r = redis_lib.Redis(host="localhost", port=6379, socket_timeout=5)
    pong = r.ping()
    info = r.info("server")
    version = info.get("redis_version", "unknown")
    memory = info.get("used_memory_human", "unknown")
    check("Redis (app cache)", pong, f"v{version} | Memory: {memory}")
except Exception as e:
    check("Redis (app cache)", False, str(e))

✅ Redis (app cache)                   v7.4.7 | Memory: unknown


## 3. OpenSearch

In [5]:
try:
    r = requests.get("http://localhost:9200/_cluster/health", timeout=10)
    if r.status_code == 200:
        data = r.json()
        status = data.get("status", "unknown")   # green / yellow / red
        nodes = data.get("number_of_nodes", 0)
        healthy = status in ("green", "yellow")  # yellow is OK for single-node dev
        warning = "yellow is OK for single-node dev" if status == "yellow" else ""
        check("OpenSearch", healthy, f"Cluster: {status} | Nodes: {nodes}", warning)
    else:
        check("OpenSearch", False, f"HTTP {r.status_code}")
except Exception as e:
    check("OpenSearch", False, str(e))

# Also check the indices
try:
    r = requests.get("http://localhost:9200/_cat/indices?v&h=index,health,docs.count", timeout=5)
    if r.status_code == 200 and r.text.strip():
        print(f"\n  OpenSearch indices:\n{r.text}")
    else:
        print("  No indices yet (expected on fresh setup)")
except:
    pass

✅ OpenSearch                          Cluster: yellow | Nodes: 1  ⚠️  yellow is OK for single-node dev

  OpenSearch indices:
index                        health docs.count
.plugins-ml-config           green           1
.opensearch-observability    green           0
top_queries-2026.02.18-95559 yellow         11
.ql-datasources              green           0
.kibana_1                    green           1
top_queries-2026.02.19-95560 yellow          7



## 4. OpenSearch Dashboards

In [10]:
ok, detail = http_get("http://localhost:5601/api/status")
check("OpenSearch Dashboards", ok, detail)

✅ OpenSearch Dashboards               HTTP 200


## 5. Ollama (Local LLM)

In [11]:
try:
    r = requests.get("http://localhost:11434/api/tags", timeout=10)
    if r.status_code == 200:
        models = r.json().get("models", [])
        model_names = [m["name"] for m in models]
        if model_names:
            check("Ollama", True, f"Models: {', '.join(model_names)}")
        else:
            check("Ollama", True, "Running — no models pulled yet",
                  "Run: docker exec -it rag-ollama ollama pull llama3.2:1b")
    else:
        check("Ollama", False, f"HTTP {r.status_code}")
except Exception as e:
    check("Ollama", False, str(e))

✅ Ollama                              Models: llama3.2:1b


## 6. FastAPI Backend

In [12]:
try:
    r = requests.get("http://localhost:8000/api/v1/health", timeout=10)
    if r.status_code == 200:
        data = r.json()
        check("FastAPI Backend", True, f"Status: {data}")
    else:
        check("FastAPI Backend", False, f"HTTP {r.status_code}: {r.text[:100]}")
except Exception as e:
    check("FastAPI Backend", False, str(e))

# Also check the OpenAPI docs are accessible
ok, detail = http_get("http://localhost:8000/docs")
print(f"  OpenAPI docs: {'✅' if ok else '❌'} {detail} → http://localhost:8000/docs")

✅ FastAPI Backend                     Status: {'status': 'ok', 'service': 'researchhub-api', 'version': '0.1.0'}
  OpenAPI docs: ✅ HTTP 200 → http://localhost:8000/docs


## 7. Frontend (Nginx)

In [13]:
ok, detail = http_get("http://localhost:3000")
check("Frontend (Nginx)", ok, detail)

✅ Frontend (Nginx)                    HTTP 200


## 8. Airflow

In [21]:
try:
    r = requests.get("http://localhost:8080/health", timeout=15)
    if r.status_code == 200:
        data = r.json()
        scheduler_status = data.get("scheduler", {}).get("status", "unknown")
        check("Airflow", True, f"Scheduler: {scheduler_status}")
    else:
        check("Airflow", False, f"HTTP {r.status_code}")
except Exception as e:
    check("Airflow", False, str(e),
          "Airflow takes ~2 minutes to initialize on first run")

# Check DAG count via REST API
try:
    r = requests.get(
        "http://localhost:8080/api/v1/dags",
        auth=("admin", "admin"),
        timeout=5
    )
    if r.status_code == 200:
        dags = r.json().get("dags", [])
        print(f"  DAGs loaded: {len(dags)} {'(' + ', '.join(d['dag_id'] for d in dags) + ')' if dags else '(none yet)'}")
except:
    pass

✅ Airflow                             Scheduler: healthy


## 9. Langfuse Web

In [15]:
try:
    r = requests.get("http://localhost:3001/api/public/health", timeout=10)
    if r.status_code == 200:
        check("Langfuse Web", True, f"HTTP {r.status_code}")
    else:
        check("Langfuse Web", False, f"HTTP {r.status_code}")
except Exception as e:
    check("Langfuse Web", False, str(e))

✅ Langfuse Web                        HTTP 200


## 10. Langfuse PostgreSQL

In [16]:
try:
    conn = psycopg2.connect(
        host="localhost",
        port=5433,  # Note: different port from app postgres
        dbname="langfuse",
        user="langfuse",
        password="langfuse",
        connect_timeout=5
    )
    cur = conn.cursor()
    cur.execute("SELECT count(*) FROM information_schema.tables WHERE table_schema = 'public';")
    table_count = cur.fetchone()[0]
    conn.close()
    check("Langfuse PostgreSQL", True, f"{table_count} public tables")
except Exception as e:
    check("Langfuse PostgreSQL", False, str(e))

✅ Langfuse PostgreSQL                 63 public tables


## 11. Langfuse Redis

In [17]:
try:
    r = redis_lib.Redis(
        host="localhost",
        port=6380,  # Note: different port from app redis
        password="langfuse_redis_password",
        socket_timeout=5
    )
    pong = r.ping()
    check("Langfuse Redis", pong, "Port 6380")
except Exception as e:
    check("Langfuse Redis", False, str(e))

✅ Langfuse Redis                      Port 6380


## 12 & 13. MinIO (Object Storage)

In [18]:
# MinIO S3 API
ok, detail = http_get("http://localhost:9090/minio/health/live")
check("MinIO S3 API", ok, detail)

# MinIO Console
ok, detail = http_get("http://localhost:9091")
check("MinIO Console", ok, detail)

✅ MinIO S3 API                        HTTP 200
✅ MinIO Console                       HTTP 200


## 14. ClickHouse (Internal — no exposed port)

In [19]:
# ClickHouse HTTP interface is only accessible inside Docker network
# We verify it indirectly by checking if Langfuse services are healthy
# (they depend on ClickHouse with condition: service_healthy)
langfuse_ok = any(r["service"] == "Langfuse Web" and r["healthy"] for r in results)
check("ClickHouse", langfuse_ok, 
      "Healthy (inferred — Langfuse depends on it)" if langfuse_ok else "Cannot verify directly",
      "ClickHouse has no exposed port — only accessible inside Docker network")

✅ ClickHouse                          Healthy (inferred — Langfuse depends on it)  ⚠️  ClickHouse has no exposed port — only accessible inside Docker network
