Skip to content

AI-powered OSINT tool for passive infrastructure discovery and organizational intelligence gathering

License

Notifications You must be signed in to change notification settings

dejisec/ReconAgent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ReconAgent

OSINT tool that combines passive infrastructure discovery, document metadata extraction, organizational intelligence gathering, and LLM-powered analysis.

Features

  • Passive enumeration via bbot — discovers subdomains, IP addresses, open ports, technologies, email addresses, and cloud storage buckets
  • Google dorking via Serper.dev API — finds exposed documents, directory listings, error pages, and sensitive files
  • Document metadata extraction — downloads discovered documents and extracts author names, internal paths, software versions, and email addresses from PDF, DOCX, XLSX, PPTX, DOC, XLS, and PPT files
  • Organizational reconnaissance — answers six intelligence questions about the target (leadership, business model, history, locations, revenue, corporate structure) using web search and SEC EDGAR filings
  • LLM-powered analysis — optional AI analysis of information gathered on the target

Requirements

  • Python 3.12+
  • uv package manager
  • A Serper.dev API key (free tier: 2,500 queries) — required for dorking and orgrecon
  • An LLM API key (Anthropic, OpenAI, etc.) — optional, needed for orgrecon and AI analysis

Installation

git clone https://github.com/dejisec/reaconagent.git && cd reconagent
uv sync

Configuration

Copy the example environment file and add update the variables:

cp .env.example .env

General

Variable Default Description
RECONAGENT_SERPER_API_KEY (required) Serper.dev API key for dorking and orgrecon
RECONAGENT_OUTPUT_DIR ./reports Output directory for HTML reports
RECONAGENT_MAX_DOWNLOAD_SIZE_MB 50 Max file size to download (MB)
RECONAGENT_DOWNLOAD_TIMEOUT 30 Download timeout in seconds
RECONAGENT_MAX_CONCURRENT_DOWNLOADS 5 Parallel download limit (1–20)

bbot

Variable Default Description
RECONAGENT_BBOT_EXCLUDE_FLAGS ["slow","active","deadly"] bbot module flags to exclude
RECONAGENT_BBOT_REQUIRE_FLAGS ["passive"] bbot module flags to require
RECONAGENT_BBOT_MAX_SCAN_DURATION 600 Max bbot scan time in seconds

Optional bbot module API keys (e.g. VirusTotal, Shodan, Censys, SecurityTrails) can enhance scan results. See .env.example for the full list.

Dorking

Variable Default Description
RECONAGENT_DORKING_REQUEST_DELAY 1.0 Delay between Serper API calls (seconds)
RECONAGENT_DORKING_MAX_RESULTS_PER_QUERY 10 Max results per dork query (1–100)

Organizational Recon

Variable Default Description
RECONAGENT_ORGRECON_MAX_SEARCH_RESULTS 10 Max Serper results per question query (1–100)
RECONAGENT_ORGRECON_MAX_EXTRACT_URLS 8 Max web pages to extract per question (1–20)
RECONAGENT_ORGRECON_MAX_CONTENT_LENGTH 15000 Max characters per extracted page (1,000–100,000)
RECONAGENT_ORGRECON_SERPER_DELAY 0.5 Seconds between Serper API calls
RECONAGENT_ORGRECON_EDGAR_DELAY 0.2 Seconds between SEC EDGAR API calls
RECONAGENT_ORGRECON_SKIP_EDGAR false Skip SEC EDGAR lookups (for private companies)

LLM Analysis

Variable Default Description
RECONAGENT_LLM_PROVIDER none LLM provider: anthropic, openai, openai-compat, ollama, none
RECONAGENT_LLM_MODEL "" Model identifier (e.g. claude-sonnet-4-5-20250929, gpt-4o-mini)
RECONAGENT_LLM_API_KEY "" API key for the LLM provider (not needed for Ollama)
RECONAGENT_LLM_BASE_URL "" Base URL for OpenAI-compatible or Ollama providers
RECONAGENT_LLM_MAX_TOKENS 4096 Maximum tokens per LLM response (256–32,768)
RECONAGENT_LLM_TIMEOUT 120 Timeout in seconds per LLM call (10–600)
RECONAGENT_LLM_TEMPERATURE 0.2 Sampling temperature (0.0–2.0, lower = more deterministic)

Usage

# Full recon (passive infrastructure + orgrecon + LLM analysis)
uv run reconagent example.com

# Infrastructure only (no orgrecon, no LLM)
uv run reconagent example.com --skip-orgrecon --skip-llm

# Orgrecon only (skip bbot and dorking)
uv run reconagent example.com --skip-bbot --skip-dorking

Pipeline Phases

Infrastructure Recon

  1. bbot — Passive reconnaissance discovers subdomains, IPs, open ports, technologies, emails, and storage buckets.
  2. Google Dorking — Queries Serper.dev with targeted dork queries (filetype, sensitive keywords, directory listings, error pages) to find exposed documents.
  3. Document Download — Downloads discovered documents that have extractable file types, with size limits and content-type validation.
  4. Metadata Extraction — Extracts author names, internal paths, software versions, emails, and timestamps from PDF, Office, and legacy Office formats.
  5. LLM Analysis — Optional AI-powered analysis of infrastructure findings using the configured LLM provider. Produces an executive summary, risk assessment, and technical recommendations.

Organizational Recon

  1. SEC EDGAR Lookup — Resolves the target company's CIK and ticker via EDGAR full-text search. Pulls 10-K, 10-Q, and DEF 14A filings plus XBRL financial facts. Skipped for private companies.
  2. Six-Question Intelligence Gathering — For each question (leadership, business, history, locations, revenue, structure): generates targeted search queries, executes them via Serper, extracts full-text content from top results, fetches question-specific EDGAR data, and synthesises a structured answer via LLM with source citations and confidence ratings.

Reporting

  1. Risk Scoring — Applies category-specific scoring rules to all collected data, generates prioritised findings and actionable recommendations.
  2. Report Generation — Renders a self-contained HTML report combining infrastructure findings, org intelligence, risk tables, and executive summary.

About

AI-powered OSINT tool for passive infrastructure discovery and organizational intelligence gathering

Topics

Resources

License

Stars

Watchers

Forks