Curated OSINT resources for discovering exposed AI infrastructure β dorks, queries, tools, and techniques for LLM, AI agent, and ML pipeline reconnaissance.
175,000+ Ollama servers exposed Β· 370,000+ Grok conversations indexed Β· AI credential leaks up 81% YoY
AI OSINT is a curated collection of Google dorks, Shodan queries, GitHub dorks, Censys queries, Sigma detection rules, threat intelligence, and security tools for finding exposed artificial intelligence infrastructure on the internet. It covers LLM endpoints (Ollama, vLLM, LM Studio), AI chatbot conversation leaks (ChatGPT, Grok, Perplexity), vector databases (Qdrant, Weaviate, ChromaDB, Milvus, Pinecone), AI agent gateways (OpenClaw, MCP servers), MLOps platforms (MLflow, Jupyter, Kubeflow), leaked AI API keys (OpenAI, Anthropic, Google Gemini, HuggingFace, Groq, Replicate, Cohere, Mistral, DeepSeek, ElevenLabs), and AI image generation services (Stable Diffusion, ComfyUI). Designed for Red Team operators, penetration testers, bug bounty hunters, and OSINT researchers working in AI/ML security.
Organizations are deploying LLMs, vector databases, and AI agents faster than they can secure them. The result:
- Ollama, vLLM, Gradio β shipped with zero authentication by default
- ChatGPT, Grok β shared conversations indexed by search engines with API keys, passwords, PII
- MCP servers β exposed agent gateways with shell access, file system access, and stored credentials
- Qdrant, ChromaDB, MLflow β no auth out of the box, exposing embeddings, models, and experiments
This repository gives Red Team operators and OSINT professionals the exact queries to find it all.
Inspired by: 7WaySecurity/cloud_osint
| Section | What You'll Find |
|---|---|
| Google Dorks | Queries for ChatGPT, Grok, HuggingFace, dashboards, config files |
| GitHub Dorks | Leaked API keys for 20+ AI providers, MCP configs, system prompts |
| Shodan Queries | Ollama, vLLM, OpenClaw, Gradio, vector DBs, MLflow, Jupyter |
| Censys Queries | Alternative engine queries for all AI services |
| AI Service Endpoints | URL patterns, default ports, API fingerprinting |
| API Key Patterns | Prefixes, regex, validation for every major AI provider |
| Vector DB Recon | Endpoints to enumerate Qdrant, Weaviate, ChromaDB, Milvus |
| MCP & Agent Exposure | The most critical emerging attack surface in AI security |
| Threat Intelligence | Operation Bizarre Bazaar, Clawdbot crisis, MCP supply chain |
| Tools | AI-specific only β scanners, red team frameworks, key detectors |
| Detection Rules | Sigma rules for monitoring AI infrastructure |
Full collection:
dorks/google/
# Grok (xAI) β 370K+ conversations indexed, NO opt-out for indexing
site:grok.com/share "password"
site:grok.com/share "API key"
site:grok.com/share "secret"
site:grok.com/share "token"
site:grok.com/share "credentials"
site:grok.com/share "sk-proj"
# ChatGPT β Feature removed Aug 2025, cached results diminishing
# Try on DuckDuckGo which continued indexing after Google stopped
site:chatgpt.com/share "API key"
site:chatgpt.com/share "sk-proj"
site:chatgpt.com/share "password"
site:chatgpt.com/share "AWS_SECRET"
# Perplexity AI
site:perplexity.ai/search "API key"
site:perplexity.ai/search "password"
# Claude (Anthropic) β ~600 convos indexed by Google, 143K+ on Archive.org
# π₯ Original dork by 7WaySecurity
site:claude.ai "public/artifacts"
site:claude.ai/share "API key"
site:claude.ai/share "sk-ant"
site:claude.ai/share "password"
site:claude.ai/share "AWS"
site:claude.ai/share ".env"
site:web.archive.org "claude.ai/share"
# HuggingFace Spaces β keys hardcoded in public Git repos
site:huggingface.co/spaces "sk-proj"
site:huggingface.co/spaces "OPENAI_API_KEY"
site:huggingface.co/spaces "os.environ"
site:huggingface.co/spaces "st.secrets"
intitle:"MLflow" inurl:"/mlflow"
intitle:"Label Studio" inurl:"/projects"
intitle:"Jupyter Notebook" inurl:"/tree" -"Login"
intitle:"Kubeflow" inurl:"/pipeline"
intitle:"Airflow - DAGs"
intitle:"Gradio" inurl:":7860"
intitle:"Streamlit" inurl:":8501"
intitle:"Open WebUI" "ollama"
intitle:"Qdrant Dashboard"
intitle:"ComfyUI"
intitle:"Stable Diffusion"
intitle:"LiteLLM" "proxy"
filetype:env "OPENAI_API_KEY"
filetype:env "ANTHROPIC_API_KEY"
filetype:env "HUGGINGFACE_TOKEN"
filetype:env "GROQ_API_KEY"
filetype:env "PINECONE_API_KEY"
filetype:env "WANDB_API_KEY"
filetype:env "DEEPSEEK_API_KEY"
filetype:env "OPENROUTER_API_KEY"
filetype:yaml "openai" "api_key"
filetype:json "anthropic" "api_key"
Full collection:
dorks/github/
β οΈ Syntax note: GitHub migrated to new Code Search. Usepath:*.envinstead of legacyfilename:.env. Queries below use modern syntax where applicable.
# OpenAI (project keys β current format since April 2024)
"sk-proj-" path:*.env
"sk-proj-" path:*.py
"OPENAI_API_KEY" path:*.env NOT "your_key" NOT "example"
# Anthropic
"sk-ant-api03" path:*.env
"ANTHROPIC_API_KEY" path:*.env
# Google AI / Gemini
"AIzaSy" path:*.env "generativelanguage"
"GOOGLE_API_KEY" path:*.env "gemini"
# HuggingFace
"hf_" path:*.env
"HF_TOKEN" path:*.env
# Groq
"gsk_" path:*.env
"GROQ_API_KEY" path:*.env
# Replicate
"r8_" path:*.env
"REPLICATE_API_TOKEN" path:*.env
# Vector DBs & MLOps
"PINECONE_API_KEY" path:*.env
"QDRANT_API_KEY" path:*.env
"WANDB_API_KEY" path:*.env
path:mcp.json "api_key"
path:mcp.json "token"
path:.cursor/mcp.json
"mcpServers" path:*.json "apiKey"
"mcpServers" path:*.json "OPENAI_API_KEY"
"system_prompt" path:*.py "you are"
"SYSTEM_PROMPT" path:*.env
path:prompts.yaml "system"
path:train.jsonl "prompt" "completion"
path:dataset.jsonl "instruction" "output"
Full collection:
dorks/shodan/
# Ollama β 175,000+ exposed instances worldwide
"Ollama is running" port:11434
port:11434 http.html:"Ollama"
port:11434 "api/tags"
# vLLM / OpenAI-compatible
port:8000 "openai" "model"
http.title:"FastAPI" port:8000 "/v1/models"
# LM Studio
port:1234 "/v1/models"
# llama.cpp
port:8080 "llama" "completion"
# OpenClaw/Clawdbot β 4,000+ on Shodan, many with zero auth
# Enables RCE via prompt injection, API key theft, reverse shells
http.title:"Clawdbot Control" port:18789
http.title:"OpenClaw" port:18789
port:18789 "api/v1/status"
port:18789 "auth_mode"
http.title:"Gradio" port:7860
http.title:"Streamlit" port:8501
http.title:"Stable Diffusion" port:7860
http.title:"ComfyUI"
# Qdrant β NO auth by default
port:6333 "qdrant"
port:6333 "/collections"
# Weaviate
port:8080 "weaviate"
# Milvus
port:19530 "milvus"
# MLflow β CVE-2026-0545 (CVSS 9.1) RCE, no auth by default
http.title:"MLflow" port:5000
# Jupyter β ~10,000+ on Shodan, targeted by botnets
http.title:"Jupyter Notebook" port:8888 -"Login"
http.title:"JupyterLab" port:8888
# TensorBoard
http.title:"TensorBoard" port:6006
Full collection:
dorks/censys/
# Ollama (Censys found 25%+ on non-default ports)
services.port=11434 AND services.http.response.body:"Ollama"
# Gradio
services.port=7860 AND services.http.response.html_title:"Gradio"
# OpenClaw
services.port=18789 AND services.http.response.body:"Clawdbot"
# Vector DBs
services.port=6333 AND services.http.response.body:"qdrant"
services.port=8080 AND services.http.response.body:"weaviate"
# MLOps
services.port=5000 AND services.http.response.html_title:"MLflow"
services.port=8888 AND services.http.response.html_title:"Jupyter"
| Provider | API Endpoint | Shared Content |
|---|---|---|
| OpenAI | api.openai.com/v1/* |
chatgpt.com/share/* |
| Anthropic | api.anthropic.com/v1/* |
β |
generativelanguage.googleapis.com/v1beta/* |
β | |
| xAI | api.x.ai/v1/* |
grok.com/share/* |
| Mistral | api.mistral.ai/v1/* |
β |
| Cohere | api.cohere.ai/v1/* |
β |
| DeepSeek | api.deepseek.com/v1/* |
β |
| Groq | api.groq.com/openai/v1/* |
β |
| HuggingFace | api-inference.huggingface.co/models/* |
huggingface.co/spaces/* |
| ElevenLabs | api.elevenlabs.io/v1/* |
β |
| Service | Port | Auth Default | Risk Level |
|---|---|---|---|
| Ollama | 11434 | β None | π΄ Critical |
| vLLM | 8000 | β None | π΄ Critical |
| OpenClaw | 18789 | β None (fixed in rebrand) | π΄ Critical |
| Gradio | 7860 | β None | π΄ Critical |
| Streamlit | 8501 | β None | π‘ High |
| MLflow | 5000 | β None | π΄ Critical |
| Qdrant | 6333/6334 | β None | π΄ Critical |
| Weaviate | 8080 (+ gRPC 50051) | β None | π‘ High |
| ChromaDB | 8000 | β None (binds localhost) | π‘ High |
| Milvus | 19530 (+ mgmt 9091) | β None | π‘ High |
| Jupyter | 8888 | β Token (often disabled) | π‘ High |
| LM Studio | 1234 | β None | π‘ High |
| GPT4All | 4891 | β None | π‘ High |
| Provider | Prefix | Length | Notes |
|---|---|---|---|
| OpenAI | sk-proj- |
~80+ chars | Current format since April 2024 |
| Anthropic | sk-ant-api03- |
~90+ chars | API keys |
| Anthropic OAuth | sk-ant-oat01- |
β | OAuth tokens (Files API, etc.) |
| Google AI | AIzaSy |
39 chars | |
| HuggingFace | hf_ |
~34 chars | Read/write access tokens |
| Replicate | r8_ |
~40 chars | β |
| Groq | gsk_ |
~52 chars | β |
# OpenAI project keys
sk-proj-[A-Za-z0-9_-]{80,}
# Anthropic
sk-ant-api03-[A-Za-z0-9_-]{90,}
# HuggingFace
hf_[A-Za-z0-9]{34}
# Replicate
r8_[A-Za-z0-9]{37}
# Groq
gsk_[A-Za-z0-9]{52}
# Google AI (β οΈ also matches Maps, YouTube, etc.)
AIzaSy[A-Za-z0-9_-]{33}
β οΈ GoogleAIzaSywarning: Google uses the same prefix for ALL Cloud APIs. A Maps API key embedded in public JavaScript can silently gain Gemini API access if the Generative Language API is enabled on the same project. This was documented as a significant issue in early 2026.
When exposed, vector databases leak proprietary embeddings, sensitive documents, and internal knowledge bases used in RAG systems.
| Database | List Collections | Extract Data |
|---|---|---|
| Qdrant | GET /collections |
POST /collections/{name}/points/scroll |
| Weaviate | GET /v1/schema |
GET /v1/objects?class={name} |
| ChromaDB | GET /api/v2/collections |
POST /api/v2/collections/{id}/query |
| Milvus | gRPC ListCollections |
gRPC Search/Query |
β οΈ ChromaDB API update: v1.0.0+ migrated to/api/v2. The legacy/api/v1/collectionsnow returns a deprecation error. Update your recon scripts accordingly.
The Model Context Protocol (MCP) is the most critical emerging attack surface in AI security (2025-2026).
MCP servers connect AI models to shell access, file systems, databases, and APIs. Misconfigurations expose:
- Shell/code execution via prompt injection
- API keys in plaintext in
.envfiles readable by agents - Tool poisoning β hidden instructions in tool metadata
- Supply chain attacks β compromised MCP packages (24,000+ secrets leaked in MCP configs in its first year)
# Shodan
port:18789 "api/v1"
http.title:"Clawdbot" OR http.title:"OpenClaw"
# GitHub
path:mcp.json "apiKey"
path:.cursor/mcp.json
"mcpServers" path:*.json "token"
| Incident | Date | Impact |
|---|---|---|
| OpenClaw Shodan Exposure | Jan 2026 | 4,000+ agent gateways, many with zero auth and RCE |
| Operation Bizarre Bazaar | Dec 2025βJan 2026 | 35,000 attacks on LLM/MCP endpoints; commercial resale |
| Smithery Registry Breach | 2025 | Fly.io token β control of 3,000+ MCP servers |
| mcp-remote CVE-2025-6514 | 2025 | Command injection in 437K+ installs |
| Cursor IDE MCP Trust Issue | 2025 | Persistent RCE via shared repo configs (CVSS 7.2-8.8) |
π Full timeline:
threat-intel/THREAT_INTELLIGENCE.md
Full entries:
threat-intel/
- Operation Bizarre Bazaar β First large-scale LLMjacking: 35K attacks, commercial marketplace selling stolen AI access
- OpenClaw/Clawdbot Shodan Crisis β CVE-2026-24061, "Localhost Trust" bypass
- ChatGPT/Grok Conversation Indexing β Google indexed thousands of conversations with credentials
- Claude Conversation Indexing β ~600 conversations indexed by Google (Forbes Sep 2025); 143K+ across all LLMs on Archive.org
- "Claudy Day" Attack Chain β Open redirect + prompt injection + Files API exfiltration in claude.ai (Oasis Security, Mar 2026)
- Claude Code Source Map Leak β 512K lines of source code exposed via npm package v2.1.88 (Mar 2026)
- MCP Supply Chain Timeline β 10+ major breaches in MCP ecosystem
- 175K Ollama Servers Exposed β SentinelOne/Censys study across 130 countries
- AI-Assisted ICS Targeting β 60+ Iranian groups using LLMs for critical infrastructure recon (Feb 2026)
Only AI-specific tools. Full details:
tools/TOOLS.mdFor generic OSINT tools (Shodan, Censys, etc.) see cloud_osint.
| Tool | What It Does | By |
|---|---|---|
| Garak | LLM vulnerability scanner β "nmap for LLMs" | NVIDIA |
| PyRIT | AI red teaming framework β Crescendo, jailbreaking, multi-turn | Microsoft |
| promptfoo | LLM pentesting CLI β 133+ attack plugins | OpenAI |
| DeepTeam | 50+ vulns, 20+ attacks, OWASP/MITRE mapping | Confident AI |
| API Radar | Real-time leaked AI API key monitoring on GitHub | Independent |
| KeyLeak Detector | Web scanner for 200+ patterns incl. 15+ AI providers | Independent |
| promptmap | ChatGPT dorks & prompt injection testing | Utku Εen |
| Vulnerable MCP | MCP vulnerability database with CVEs and PoCs | Community |
Full Sigma rules:
detection-rules/SIGMA_RULES.md
7 Sigma detection rules covering:
- External access to Ollama (port 11434)
- AI API keys appearing in application logs
- Unauthorized LLM API access without Bearer tokens
- Exposed MCP servers with
auth_mode: none - Vector database unauthorized enumeration
- LLMjacking β anomalous inference spikes (>500 requests/hour)
- AI agent RCE via prompt injection (suspicious tool calls to bash/python_repl)
| Source | Finding | Link |
|---|---|---|
| SentinelOne + Censys | 175,108 exposed Ollama hosts across 130 countries | SecurityWeek |
| Cisco Talos | 1,100+ Ollama servers, 20% serving models without auth | Cisco Blog |
| Pillar Security | Operation Bizarre Bazaar: 35,000 LLMjacking attacks | Pillar Security |
| GitGuardian | AI credential leaks surged 81% YoY; 29M secrets on GitHub | Report |
| AuthZed | Complete MCP security breaches timeline | Blog |
| Pangea | Sensitive data in indexed ChatGPT histories | Blog |
| Trail of Bits | 8 high-severity vulns in Gradio 5 audit | Blog |
| Oasis Security | "Claudy Day" β 3 vulns chained for data exfiltration from Claude | Blog |
| Forbes | ~600 Claude conversations indexed by Google | Article |
| Obsidian Security | 143K+ LLM chats (incl. Claude) on Archive.org | Blog |
| TechRadar | Claude Code 512K source lines leaked via npm | Article |
| Penligent | Claude Code source map leak analysis | Blog |
- OWASP Top 10 for LLM Applications 2025
- OWASP Top 10 for Agentic Applications 2026
- MITRE ATLAS
- MCP Security Best Practices
- vulnerablemcp.info
The resources in this repository are for authorized security testing, education, and legitimate research only.
- Obtain proper authorization before testing infrastructure you do not own
- Follow applicable laws (CFAA, GDPR, local equivalents)
- Report vulnerabilities responsibly through appropriate disclosure channels
- Never exploit discovered misconfigurations for unauthorized access
The authors assume no liability for misuse.
PRs welcome! Please verify dorks/queries work before submitting. See CONTRIBUTING.md.
Code: MIT Β· Data & Documentation: CC BY-SA 4.0
Maintained by 7WaySecurity Β· Last updated April 2026
