Skip to content

NotUnHackable/OmniWordlistPro

Repository files navigation

OmniWordlist Pro — Enterprise Wordlist Generator

Version: 1.1.0
Author: Aaryan Bansal
Language: Python Repository: https://github.com/AaryanBansal-dev/OmniWordlistPro
Build Status: Actively Maintained
Last Updated: November 18, 2025


⚡ Getting Started (2 minutes)

Prerequisites

  • Python 3.8+ (most systems have this installed)
  • Git (for cloning)
  • Linux/macOS/Windows (all supported)

🚀 Quick Install & Run

# Clone the repository
git clone https://github.com/AaryanBansal-dev/OmniWordlistPro.git
cd OmniWordlistPro

# Install dependencies (optional, for better UI)
pip install click rich

# Run directly - single file, no build needed!
python3 omni.py info

# Or make it executable and add to PATH
chmod +x omni.py
./omni.py list-presets

📚 Full Documentation


Overview

OmniWordlist Pro is a production-ready wordlist generation platform written entirely in Python. It combines:

  • Crunch compatibility: Pattern-based generation with charset support (@, %, ^, ,)
  • CUPP integration: 1500+ toggleable fields for personalization
  • 100+ transforms: Leet, homoglyph, emoji, pluralization, keyboard shifts, etc.
  • Enterprise features: Checkpointing, deduplication, compression support
  • Single-file script: No build required, just Python
  • Streaming architecture: Memory-efficient iterator-based generation
  • Multi-format output: TXT, GZIP, BZIP2, LZ4, ZSTD, JSONL, CSV

Perfect for:

  • 🎯 Penetration testers & red teams
  • 🎯 Bug bounty hunters
  • 🎯 Security researchers
  • 🎯 Credential auditing
  • 🎯 Creative wordlist experiments

Core Features (Actually Implemented ✅)

🎯 Generation & Combinatorics

  • Charset-based generation — Custom character sets with Crunch-style patterns
  • Pattern support@ (lower), % (digit), ^ (symbol), , (upper) expansion
  • Length constraints — Min/max word length control
  • Prefix/suffix support — Prepend/append to all generated tokens
  • Field-based generation — 1500+ fields across 12+ categories
  • Streaming architecture — Memory-efficient token generation

🔄 Transforms (100+ available)

  • Case transforms — uppercase, lowercase, capitalize, toggle_case, title_case
  • Leet speak — basic, full, random leet variations
  • Homoglyphs — single, random, full expansion
  • Keyboard shifts — adjacent key substitutions
  • Diacritics — expand/strip unicode marks
  • Emoji injection — insertion and random placement
  • Append numbers — suffix with configurable digit patterns
  • String reversal — reverse entire tokens
  • Pluralization — English pluralization rules

�️ Filters & Quality

  • Length validation — Min/max character constraints
  • Charset filtering — Allowlist/blocklist character validation
  • Entropy calculation — Shannon entropy scoring
  • Quality scoring — 0.0-1.0 quality rating system
  • Pronounceability — Basic pronunciation quality checks

💾 Output & Storage

  • Text output — Plain UTF-8 TXT format
  • Compression formats — GZIP, BZIP2, LZ4, ZSTD
  • JSON output — JSONL (one JSON per line)
  • CSV export — Comma-separated values with headers
  • Per-chunk checksums — BLAKE2b integrity verification

🎮 User Interface

  • CLI interface — Full command-line argument support
  • TUI dashboard — Beautiful Ratatui-based interactive interface (Experimental)
  • Help system — Built-in --help for all commands
  • Preview mode — Sample generation before full job

📋 Presets (5 Built-in)

  1. pentest_default — Standard pentesting wordlist
  2. meme_humor_pack — Creative wordlist with humor
  3. api_dev_wordlist — API endpoint patterns
  4. social_media_usernames — Social media handles
  5. pattern_basic — Crunch-style pattern examples

Usage Examples

Example 1: Basic Generation

# Generate all 3-character combinations from 'abc'
python3 omni.py run --min 3 --max 3 --charset "abc" -o output.txt
# Output: aaa, aab, aac, aba, abb, ... bca, bcb, bcc

# View first 10 lines
head -10 output.txt

Example 2: With Transformations

# Generate with leet speak
python3 omni.py run \
  --min 5 \
  --max 10 \
  --charset "abcdefghijklmnopqrstuvwxyz0123456789" \
  --prefix "admin_" \
  --suffix "!2024" \
  -o output.txt

Example 3: Using Presets

# List available presets
python3 omni.py list-presets

# Preview pentest preset (show 50 samples)
python3 omni.py preview --preset pentest_default --sample-size 50

# Generate full wordlist
python3 omni.py run --preset pentest_default -o pentest.txt

Example 4: Compressed Output

# Generate with GZIP compression
python3 omni.py run \
  --charset "abcdefghijklmnopqrstuvwxyz0123456789" \
  --min 6 \
  --max 12 \
  --compress gzip \
  -o wordlist.txt.gz

# Generate with ZSTD (faster compression)
python3 omni.py run \
  --charset "abcdefghijklmnopqrstuvwxyz0123456789" \
  --min 6 \
  --max 12 \
  --compress zstd \
  -o wordlist.txt.zst

# Decompress when needed
gunzip wordlist.txt.gz
zstd -d wordlist.txt.zst

Example 5: JSON Output

# Generate as JSONL (one JSON per line)
python3 omni.py run \
  --charset "abc123" \
  --min 4 \
  --max 6 \
  --format jsonl \
  -o output.jsonl

# View the output
cat output.jsonl | head -5
# Each line is: {"token":"abc1","entropy":2.3,"length":4}

Example 6: Field-Based Generation

# List all field categories
python3 omni.py fields --categories

# List fields in a specific category
python3 omni.py fields --category personal

# Generate from specific fields (if implemented)
python3 omni.py run \
  --fields first_name_male_0,last_name_0 \
  -o names.txt

Project Structure

OmniWordlistPro/
├── omni.py              # Single-file Python script (~2000 lines)
├── omniwordlist/        # Original modular Python code
│   ├── __init__.py
│   ├── cli.py           # CLI entry point & command handling
│   ├── error.py         # Error types & handling
│   ├── config.py        # Configuration validation
│   ├── charset.py       # Character sets & patterns
│   ├── fields.py        # 1500+ field taxonomy
│   ├── generator.py     # Core streaming generation engine
│   ├── transforms.py    # 100+ transform types
│   ├── filters.py       # Quality & validation filters
│   ├── storage.py       # Output writing & compression
│   └── presets.py       # Preset management
│
├── Documentation/       # All documentation files
│   ├── FEATURES.md      # Feature list & status
│   ├── INSTALL.md       # Installation guide
│   ├── QUICK_START.md   # Command reference
│   └── ...
│
├── requirements.txt     # Python dependencies
├── setup.py             # Python package setup
├── README.md            # This file
└── tests/               # Test suite

Core Components

omni.py — All-in-One Script

  • Single file containing all functionality
  • No build process required
  • Optional dependencies (click, rich) for better UX
  • Can be used directly: python3 omni.py

Character Sets & Patterns

  • Predefined charsets: lowercase, uppercase, digits, symbols
  • Pattern expansion for Crunch compatibility
  • Character set merging and operations

Fields — Field Taxonomy

  • 1500+ available fields across categories
  • Field metadata (type, examples, cardinality)
  • Field dependency tracking

Generator — Streaming Engine

  • Generates combinations of characters/fields
  • Memory-efficient iterator-based approach
  • Support for custom ordering and sampling

Transforms — Transformation Pipeline

  • 100+ available transforms
  • Chain transforms together
  • Each transform is deterministic

Filters — Quality & Validation

  • Length constraints
  • Entropy calculations
  • Character set validation
  • Pronounceability scoring

Storage — Output & Compression

  • Multiple output formats: TXT, JSONL, CSV
  • Compression: GZIP, BZIP2, LZ4, ZSTD
  • Streaming writers (no full buffering)

Presets — Preset Management

  • Load/save preset configurations
  • Built-in presets for common use cases
  • Preset validation & merging

CLI Command Reference

omni.py run — Generate a wordlist

python3 omni.py run [OPTIONS]

Key options:

  • --min <LEN> — Minimum word length (default: 1)
  • --max <LEN> — Maximum word length (default: 10)
  • --charset <CHARS> — Character set to use (default: lowercase)
  • --prefix <STR> — Prepend to each token
  • --suffix <STR> — Append to each token
  • --preset <NAME> — Use a named preset
  • --compress <FORMAT> — Compress output (gzip, bzip2, lz4, zstd)
  • --format <FMT> — Output format (txt, jsonl, csv)
  • -o, --output <FILE> — Output file path
  • -s, --sample-size <N> — Limit output to N tokens

Example:

python3 omni.py run --min 5 --max 10 --charset "abcdefghijklmnopqrstuvwxyz0123456789" -o wordlist.txt

omni.py preview — Sample generation before full run

python3 omni.py preview [OPTIONS]

Options:

  • --preset <NAME> — Preview a preset
  • --sample-size <N> — Number of samples to show (default: 10)
  • --min <LEN>, --max <LEN> — Length constraints

Example:

python3 omni.py preview --preset pentest_default --sample-size 50

omni.py list-presets — Show available presets

python3 omni.py list-presets

Output:

Available Presets:
1. pentest_default       - Standard pentesting wordlist
2. meme_humor_pack       - Creative with humor
3. api_dev_wordlist      - API endpoint patterns
4. social_media_usernames - Social handles
5. pattern_basic         - Crunch-style patterns

omni.py show-preset — Display preset details

python3 omni.py show-preset <PRESET_NAME>

Example:

python3 omni.py show-preset pentest_default

omni.py fields — Browse available fields

python3 omni.py fields [OPTIONS]

Options:

  • --categories — List all field categories
  • --category <NAME> — List fields in a category
  • --search <QUERY> — Search for fields

Example:

python3 omni.py fields --categories
python3 omni.py fields --category personal

omni.py info — Show version and system info

python3 omni.py info

Output shows:

  • Script version
  • Supported transforms
  • Supported compression formats
  • System information

Configuration (JSON/TOML)

Example JSON Config

{
  "min_length": 8,
  "max_length": 16,
  "charset": "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789",
  "pattern": null,
  "enabled_fields": [
    "first_name_male_0",
    "last_name_0",
    "common_suffix_0"
  ],
  "transforms": [
    "leet_basic",
    "capitalize",
    "append_numbers_4"
  ],
  "filters": {
    "min_len": 8,
    "max_len": 32,
    "charset_filter": "abcdefghijklmnopqrstuvwxyz0123456789"
  },
  "output_file": "./output/wordlist.txt.gz",
  "compression": "gzip",
  "dedupe": true,
  "bloom_fp_rate": 0.01,
  "prefix": "admin",
  "suffix": "!2024",
  "workers": 8
}

Example TOML Config

min_length = 8
max_length = 16
charset = "abc123"
pattern = "pass%%"
output_file = "./output/wordlist.txt"
compression = "gzip"
dedupe = true

enabled_fields = [
  "first_name_male_0",
  "last_name_0"
]

transforms = [
  "capitalize",
  "append_numbers_4"
]

[filters]
min_len = 8
max_len = 32
charset_filter = "abcdefghijklmnopqrstuvwxyz0123456789"

Performance & Benchmarks

Performance varies based on:

  • Character set size — Larger charsets = more combinations
  • Word length — Longer words = exponentially more combinations
  • Transforms applied — More transforms = slower output
  • System hardware — CPU speed affects iteration speed

Typical speeds on modern hardware:

Scenario Charset Length Tokens Characteristics
Simple a-z 3-5 237K Fast generation
Mixed case + digits A-Za-z0-9 6-8 2.5M Moderate speed
With transforms A-Za-z0-9 6-8 500K Transform overhead

Memory usage:

  • Base script: ~5-10 MB
  • Typical generation: 20-100 MB (iterator-based, minimal buffering)
  • With deduplication: Grows with unique token count
  • With compression: Depends on compression format

Development Guide

For contributors and developers interested in extending OmniWordlist Pro, see DEVELOPMENT.md for:

  • Building from source
  • Running tests
  • Contributing guidelines
  • Architecture deep-dive
  • Extending with custom transforms

Troubleshooting

Issue: "click" or "rich" module not found

Solution: Install the optional dependencies:

pip install click rich

Issue: Script runs slowly

Solution: Python is interpreted, so it's slower than compiled languages. For massive wordlists, consider:

# Use PyPy for better performance
pypy3 omni.py run --min 3 --max 5 --charset abc -o output.txt

Issue: "Permission denied" when creating output file

Solution: Check directory permissions:

# Make sure output directory exists and is writable
mkdir -p ~/wordlists
chmod 755 ~/wordlists

# Run command with that directory
python3 omni.py run --min 3 --max 5 --charset abc -o ~/wordlists/output.txt

Issue: Output file is empty or missing

Solution: Check if the generation actually ran:

# Preview first
python3 omni.py preview --sample-size 10

# Check what you're generating
python3 omni.py run --min 1 --max 2 --charset "ab" -o test.txt
cat test.txt  # Should show: a, b, aa, ab, ba, bb

Issue: How do I interrupt a long-running job?

Solution: Press Ctrl+C to stop generation.


Contributing

Contributions are welcome! To get started:

  1. Fork the repository on GitHub
  2. Clone your fork locally
  3. Create a feature branch (git checkout -b feature/my-feature)
  4. Make your changes and add tests
  5. Run tests (python3 -m pytest tests/)
  6. Commit and push to your fork
  7. Submit a pull request with clear description

Areas for contribution:

  • New transforms (more leet variations, emoji sets, etc.)
  • Additional field packs (specific industries, languages)
  • Performance optimizations
  • Documentation improvements
  • Bug fixes
  • Testing edge cases

See DEVELOPMENT.md for detailed contribution guidelines.


License

MIT License — See LICENSE file for details


Support

Getting Help

Documentation Files

Quick Links

1. Project Overview

Mission: Build the single most flexible wordlist generator — capable of producing highly targeted lists for pentesting, research, and creative experiments — by letting users toggle every conceivable field (personal, cultural, technical, humor, memes, music, language, keyboard patterns, encodings, etc.) and combine them with robust transforms and constraints.

Target users:

  • Penetration testers / red teams
  • Bug bounty hunters
  • Security researchers / DFIR teams
  • DevOps/security engineers (credential audit)
  • Power users / hobbyists (creative wordlists, memes)

Core differentiators:

  • Field-driven architecture with granular toggles (≥1,500 fields)
  • Crunch-like charset & template support + CUPP personalization
  • Stream-first generation (no OOM) with resumable checkpoints & dedupe
  • Enterprise features: RBAC, audit logs, encryption, quotas, marketplace

2. Key Concepts & Terminology

  • Field: A discrete source token (e.g., first_name_male, fav_meme_format) that can be enabled/disabled. Fields have metadata (type, examples, cardinality estimate, dependencies).
  • Preset/Profile: Saved selection of fields, transforms, and generation settings.
  • Transform Pipeline: Ordered modifications applied to tokens (leet, homoglyph, transliteration, etc.).
  • Combinator Engine: Produces combinations/permutations across enabled fields following rules (cartesian, sequences, weighted sampling).
  • Sink/Output: Where generated tokens are written (file, S3, API, STDOUT).
  • Checkpoint: Persistent state to resume long jobs deterministically.
  • Rule DSL: Small domain-specific language to express templates, conditional rules, and constraints.

3. Complete Feature Set (summary)

This combines the Core Features previously described and the 120+ advanced features. Key groups:

Generation & Combinatorics

  • Field grouping, adaptive combinator caps, weighted & conditional combinator rules, stochastic generation, reservoir sampling, sequence templates, cross-field exclusion, hierarchical profiles, dynamic rule scoring, constraint solver.

Transforms & Mutations

  • Multi‑pass transforms, context-aware leet, homoglyph injection, keyboard-shift transforms, phonetic/IPA transforms, diacritic expansion/stripping, transliteration (multi-script), emoji injection rules, macro transforms, entropy-guided mutation, Levenshtein fuzzing, pluralization, dialect variations.

Filters & Quality

  • Probabilistic profanity filter, entropy & pronounceability filters, language detection & family filters, visual-similarity, regex sandbox, charset intersection, stopword & frequency filters, token-length histogram enforcement.

Outputs & Storage

  • Chunked gzipped outputs, S3/MinIO multipart streaming, compressed formats (gz/bz2/zstd/7z), signed/encrypted artifacts, content-addressed storage, per-chunk checksums, metadata manifests, diffs between runs, TTL/retention, partial downloads, watermarking, format conversion API.

Performance & Scalability

  • Distributed generation with sharded partitions, autoscale worker pools, GPU-accelerated transforms (ML filters), streaming backpressure, sharded Bloom filters, RocksDB/LMDB external backing, adaptive compression, IO benchmarking suite, parallel transform pipelines.

Reliability & Recovery

  • Multi-level checkpointing (local/remote), deterministic resumes, job snapshotting, checkpoint compaction, canary jobs, restart/backoff strategies, corruption detection & repair, cross-region failover, idempotent job submission, job priority queues.

UX & Developer Tools

  • Visual rule builder, live preview pane, rule DSL with Monaco integration, preset versioning, inline docs & examples, field dependency graph, keyboard-first UI, dark/light theming.

Integrations

  • Hashcat/John exports, direct SCP to pentest VMs, SIEM connectors, Slack/Discord notifications, webhooks, Git integration for presets, GitHub Actions, VSCode extension, browser capture extension, Zapier/Make connectors, REST + GraphQL APIs.

Security & Compliance

  • GDPR erasure, PII scanner, compliance presets, immutable audit logs, RBAC, field-level encryption, per-job ACLs, MFA for admin ops, signed manifests, consent flows for personal fields, export redaction templates, plugin sandbox (WASM).

Monetization & Marketplace

  • Pay-per-job billing, premium rulepacks marketplace, job credits, team seat pricing, usage reports, promo code system, affiliate tracking, demo generator for marketing.

AI & Analytics

  • ML-based seed suggestions, auto-tune transforms using historical hit rates, LLM-driven predicate generation, auto-summarize outputs, semantic dedupe (vector-based), anomaly detection, auto-documentation generation.

Testing & QA

  • Synthetic corpora generator, preflight dry-runs, regression harness, fuzz testing, rule coverage reports, mutation tests, deterministic test seeds, integration tests with Hashcat/John.

Collaboration & Extensibility

  • Shared preset libraries, commenting & review workflows, co-editing, delegated approvals, template ratings, issue tracker integration, plugin marketplace, SDKs (Python, JS, Go), web integration options.

(Full feature flags list is included in repo FEATURES.md — see sample JSON at the end of this doc.)


4. Field Taxonomy & Schema Strategy

We’ll provide the field taxonomy in three layers:

  1. Category — High level (Personal, Language, Humor, Music, Internet/Tech, Numbers, Keyboard, Fantasy, Science, Regional, Style, etc.)
  2. Field Group — Mid-level (Names, Dates, MemeFormats, SongSnippets, IPPatterns, KeyboardWalks, Homoglyphs, etc.)
  3. Field — Granular toggle (e.g., first_name_male, birth_month_name, fav_meme_format, keyboard_walk_qwerty_diag, emoji_single, leet_full).

Field metadata (per field)

{
  "id": "first_name_male",
  "category": "personal",
  "group": "names",
  "type": "string",
  "examples": ["Aaryan","Arjun"],
  "cardinality_estimate": 10000,
  "sensitivity": "low",
  "dependencies": [],
  "conflicts": [],
  "ui_hint": "text,autocomplete",
  "default_enabled": true
}

Strategy to reach 1,500+ fields

  • Start with the 500+ base fields already defined (common templates).
  • Add 70 personality fields (memes/humor) and 120 advanced features toggles as feature flags.
  • Expand language packs (per language add 50+ fields: stopwords, diacritics, transliteration rules).
  • Add regional variants (per-country city lists, postal codes, area codes).
  • Add domain-specific packs (finance, healthcare—non-sensitive placeholders).
  • Provide extensible field loader to import CSV/JSON packs so community/corporate customers can drop in their own 1000+ values.

5. Architecture

High-level components

  • Frontend (Electron/React) — UI, Visual Rule Builder, Live Preview, Preset management.
  • API Server (FastAPI / Async Python) — Job submission, config validation, preset management, auth.
  • Job Scheduler — Celery/RabbitMQ or Redis Queue; spawns workers (Kubernetes Jobs) for heavy generation.
  • Generator Worker — Core combinator engine (Python), streaming transforms, dedupe (Bloom/RocksDB), checkpointing.
  • Storage — S3/MinIO for outputs, Postgres for metadata, Redis for caches/queues, RocksDB for on-disk dedupe.
  • Auth & Billing — OAuth2, NextAuth + Stripe integration.
  • Observability — Prometheus + Grafana + Loki/ELK.

Data flow (textual diagram)

[User] -> [GUI/CLI] -> [API Server] -> [Config Validator] -> [Scheduler]
    -> [Worker Pool] -> [Transform Engine] -> [Dedupe/Filter] -> [Sink (S3/File)]
    -> [Metadata / Checkpoints (Postgres/SQLite)] -> [Notifications/Webhooks]

Worker internals (pipeline)

  1. Config validation: Check field dependencies, estimate cardinality.
  2. Preflight: If >threshold, run sample/canary job and prompt user.
  3. Partitioning: Split combinatorial space across worker shards.
  4. Stream generation: Seed→Combinator→Transforms→Filters→Dedupe→Write chunk.
  5. Checkpoint: Persist last emitted hash per shard.
  6. Finalize: Assemble chunk manifests and upload metadata.

Determinism & Resume

  • Use stable hashing (BLAKE2b) of tokens and deterministic partitioning.
  • Checkpoints store (job_id, shard_id, last_hash, offset, bloom_state) to resume exactly.

6. APIs & CLI Spec

REST API (sample endpoints)

  • POST /api/v1/jobs — Submit a generation job (body: preset/config).
  • GET /api/v1/jobs/{job_id} — Job status & metadata.
  • GET /api/v1/jobs/{job_id}/chunks — List output chunk manifests.
  • POST /api/v1/presets — Save a preset.
  • GET /api/v1/presets/{id} — Fetch preset.
  • POST /api/v1/presets/{id}/validate — Validate preset & estimate size.
  • POST /api/v1/presets/{id}/sample — Produce a sampled preview (N results).
  • GET /api/v1/features — List feature flags.
  • POST /api/v1/auth/login — OAuth token exchange.

Example POST /api/v1/jobs payload (abridged)

{
  "preset_id": "pentest_default_v1",
  "fields": ["company_name","dev_handles","first_name_male","birth_year"],
  "transforms": ["leet_basic","reverse","append_numbers_4"],
  "filters": {"min_len":8,"max_len":32,"charset":"ascii"},
  "output": {"type":"s3","bucket":"omni-results","path":"jobs/1234/"},
  "schedule": {"priority":"high"},
  "callbacks": {"webhook":"https://hooks.example/job-cb"}
}

CLI (examples)

  • omni run --config preset.json --out ./out.gz --resume
  • omni preview --preset pentest_default_v1 --sample 1000
  • omni export-rule --preset x --format hashcat.rule

7. UX / GUI design & screens

Primary screens

  1. Dashboard — Recent jobs, presets, usage metrics, billing.
  2. Preset Editor (Visual Rule Builder) — Left: field tree with toggles; center: drag/drop rule canvas; right: field examples & dependency hints. Supports Monaco rule editor for DSL.
  3. Live Preview — Stream first N results, show stats (entropy hist, char distribution).
  4. Job Submit — Options for output, retention, encryption, and approvals.
  5. Job Monitor — Per-job progress, per-shard throughput, logs, resume/cancel.
  6. Marketplace — Browse/purchase rulepacks & templates.
  7. Admin — RBAC, audit logs, billing, quotas.

Interaction patterns

  • Progressive disclosure: Most fields collapsed; power users expand groups.
  • Warnings: Cardinality estimates displayed with visual risk (green/yellow/red).
  • Presets: Shareable, versioned, forkable.
  • One-click canary: Run a 1k sample to sanity-check before full job.

8. Data Formats, Outputs & Integrations

  • Primary outputs: newline-separated UTF-8 TXT / gzipped TXT, JSONL (structured tokens), CSV (token + metadata columns), Parquet for analytics.
  • Hashcat/John integration: export flags and rule files; compatible encodings.
  • S3 Hooks: multipart, progress, per-chunk checksum.
  • SIEM: Push job metadata & summary into Splunk/ELK.
  • Notification: Slack/Discord webhooks; optional manual approval flows.

9. Scaling, Reliability & Security Design

Scalability

  • Kubernetes-based worker autoscaling.
  • Partition strategy: split combinatorial axes (e.g., field groups) into shards.
  • Dedupe: sharded Bloom filters persisted to RocksDB for memory efficiency.
  • Hot presets cached in Redis for fast sampling.

Reliability

  • Checkpoints every N tokens or per-chunk boundary.
  • Job snapshotting ensures deterministic replay.
  • Canary jobs before full runs; auto-cancel on policy violations.

Security

  • TLS everywhere, OAuth2 + scopes, per-tenant KMS (Key Management Service) integration.
  • Field-level encryption for sensitive fields.
  • Audit logs immutable via append-only storage.
  • Plugin runtime sandbox using WASM + syscall filter.
  • RBAC enforced on API & GUI.

10. Compliance, Privacy & Legal Considerations

  • GDPR: per-user data erasure endpoint; export redaction templates.
  • PII: PII scanner warns when fields likely to contain PII and requires explicit confirmation.
  • Terms of Service & Acceptable Use: users must confirm legal usage (testing only on authorized targets). Company-level legal team to craft TOS and export controls.
  • Marketplace vetting: review paid rulepacks to avoid malicious content; sign & certify packs.

11. Testing Strategy & QA Plan

  • Unit tests: transforms, combinator engine, filters.
  • Integration tests: end-to-end small job generation, resume flow.
  • Performance tests: IO throughput, Bloom filter scaling, distributed generation.
  • Fuzz tests: malformed DSL, regex edge cases (catastrophic backtracking).
  • Security tests: plugin sandbox escape attempts, RBAC bypass, input validation.
  • CI pipeline: run deterministic seeds, assert sample outputs.
  • Canary & Staging: small production-like cluster for heavy jobs.

12. Roadmap & Milestones (90-day plan, Pentester-first)

Phase 0 — Week 0 (Project Init)

  • Repo bootstrap, basic README, field schema v0 (500 fields), basic CLI.

Phase 1 — Week 1–4 (MVP)

  • Implement streaming single-node generator (clean approach).
  • Implement transforms: leet, reverse, padding, simple homoglyphs.
  • SQLite checkpointing, gz sink, basic Bloom dedupe.
  • CLI commands: run, preview, resume.
  • Docs & sample presets (pentest_default).

Phase 2 — Week 5–8 (Scale & UX)

  • API server (FastAPI), job scheduler (Celery).
  • Electron GUI with Visual Rule Builder (preview pane).
  • Cardinality estimator, canary sample mode, preflight checks.
  • S3 sink, chunking, per-chunk checksums.

Phase 3 — Week 9–12 (Enterprise & Integrations)

  • RBAC, audit logs, KMS integration, GDPR endpoints.
  • Hashcat/John exports, VSCode extension, GitHub Actions orb.
  • Marketplace MVP (upload/publish rulepacks).

Milestones + Deliverables

  • End of Week 4: CLI + streaming generator + sample presets.
  • End of Week 8: GUI Beta + API server + S3 output.
  • End of Week 12: Enterprise features + marketplace MVP.

13. Developer Guide & Folder Structure

/omniwordlist
├─ /docs
├─ /backend
│  ├─ api/              # FastAPI app
│  ├─ jobs/             # Job scheduler + worker code
│  ├─ generator/        # combinator engine, transforms
│  ├─ storage/          # S3 / local storage adapters
│  └─ tests/
├─ /cli
│  └─ omni (entrypoint)
├─ /frontend
│  ├─ electron-app
│  └─ web/
├─ /plugins
├─ /presets
├─ /field-packs
├─ /infrastructure
│  ├─ k8s/
│  └─ terraform/
└─ docker-compose.yml

Key repositories / modules

  • generator/engine.py — pipeline orchestration (seed→transform→filter→sink)
  • generator/transforms/* — specific transforms (leet, homoglyph, phonetic)
  • generator/dedupe/* — Bloom/RocksDB adapters
  • api/jobs.py — job submission & validation
  • cli/omni — CLI wrapper to call API or run local engine
  • frontend/visual-builder — React components & Monaco integration

14. Sample Configs & Presets

Example pentest_default.json

{
  "name": "pentest_default_v1",
  "fields": ["company_name", "dev_handles", "first_name_male", "last_name", "birth_year", "common_suffixes"],
  "transforms": ["leet_basic","append_numbers_best_4","titlecase_names"],
  "filters": {"min_len":8,"max_len":30,"charset":"ascii"},
  "output": {"type":"file","path":"./out/pentest_default_v1.gz"}
}

Example meme_humor_pack.json

{
  "name": "meme_humor_pack",
  "fields": ["fav_meme_format","favorite_joke","favorite_pun","go_to_reaction_emoji"],
  "transforms": ["emoji_insertion","random_case"],
  "filters": {"min_len":3,"max_len":140},
  "output": {"type":"jsonl","path":"./out/meme_pack.jsonl"}
}

15. Monetization & Ops Playbook

  • Pricing Tiers: Free (limited lines/day), Pro ($49/mo), Team ($199/mo), Enterprise (custom).
  • Monetizable features: Pay-per-job heavy exports, premium transforms (ML phonetics), marketplace rulepacks, priority SLA.
  • Marketing: OSS launch of CLI core on GitHub (MIT), blog posts (Hacker News, r/netsec), demos with bug-bounty teams, webinars.
  • Support & Onboarding: Docs + video tutorials, community Discord, paid enterprise onboarding.
  • SLA & Ops: K8s cluster with autoscaling, backups (S3 lifecycle), incident runbooks.

16. Action Items — Immediate (copy/paste checklist)

  1. Create GitHub repo omniwordlist with README.md, CONTRIBUTING.md, and license.
  2. Implement generator/engine.py streaming pipeline (use clean approach code pattern).
  3. Implement transforms: leet_basic, reverse, append_numbers_best_4, emoji_insertion.
  4. Add SQLite checkpointing & Bloom dedupe adapter with RocksDB fallback.
  5. Create presets/pentest_default.json and presets/meme_humor_pack.json.
  6. Build CLI omni run --config and omni preview --sample.
  7. Draft FEATURES.md listing all 120+ advanced feature flags with short descriptions.
  8. Build cardinality estimator & preflight sample runner (1k lines).
  9. Create issue templates and a project board for Phase 1 tasks.
  10. Prepare release v0.1 (CLI only) and an initial demo video.

Appendices

A. Feature Flags Example (JSON)

{
  "features": {
    "adaptive_combinator_cap": true,
    "weighted_combinator_generation": true,
    "homoglyph_injection_engine": true,
    "signed_encrypted_artifacts": false,
    "distributed_generation": false,
    "visual_rule_builder": false,
    "gdpr_data_erasure": false,
    "ml_seed_suggestion": false,
    "plugin_system_wasm": false
  }
}

B. Example Job Lifecycle

  1. User uploads/creates preset.
  2. User requests preview → system runs sample job (1k).
  3. User approves full job → system partitions job, spins up workers.
  4. Workers stream tokens to S3 with per-chunk checksums.
  5. On completion, job metadata + manifest created; user notified.

C. Security & AUP snippet (to include in product)

Users must confirm they have authorization to test any systems, accounts, or targets against which generated wordlists are used. OmniWordlist Pro is a defensive tool. Usage against unauthorized targets is prohibited.


Closing / Next Steps

I can now:

  • scaffold the repo with the CLI MVP, streaming generator, checkpointing, and sample presets; or
  • produce the full FEATURES.md (120+ features) and FIELDS_SCHEMA.json (1,500+ field entries ready to import); or
  • create the Electron visual rule builder mockups + Monaco DSL examples.

Which of these should I generate right now and produce as files you can download and run?

About

Custom Wordlist Generator

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors