Version: 1.1.0
Author: Aaryan Bansal
Language: Python
Repository: https://github.com/AaryanBansal-dev/OmniWordlistPro
Build Status: Actively Maintained
Last Updated: November 18, 2025
- Python 3.8+ (most systems have this installed)
- Git (for cloning)
- Linux/macOS/Windows (all supported)
# Clone the repository
git clone https://github.com/AaryanBansal-dev/OmniWordlistPro.git
cd OmniWordlistPro
# Install dependencies (optional, for better UI)
pip install click rich
# Run directly - single file, no build needed!
python3 omni.py info
# Or make it executable and add to PATH
chmod +x omni.py
./omni.py list-presets- INSTALL.md — Complete installation & troubleshooting guide
- QUICK_START.md — CLI command reference
- DEVELOPMENT.md — Development setup & contribution guide
OmniWordlist Pro is a production-ready wordlist generation platform written entirely in Python. It combines:
- ✅ Crunch compatibility: Pattern-based generation with charset support (@, %, ^, ,)
- ✅ CUPP integration: 1500+ toggleable fields for personalization
- ✅ 100+ transforms: Leet, homoglyph, emoji, pluralization, keyboard shifts, etc.
- ✅ Enterprise features: Checkpointing, deduplication, compression support
- ✅ Single-file script: No build required, just Python
- ✅ Streaming architecture: Memory-efficient iterator-based generation
- ✅ Multi-format output: TXT, GZIP, BZIP2, LZ4, ZSTD, JSONL, CSV
Perfect for:
- 🎯 Penetration testers & red teams
- 🎯 Bug bounty hunters
- 🎯 Security researchers
- 🎯 Credential auditing
- 🎯 Creative wordlist experiments
- ✅ Charset-based generation — Custom character sets with Crunch-style patterns
- ✅ Pattern support —
@(lower),%(digit),^(symbol),,(upper) expansion - ✅ Length constraints — Min/max word length control
- ✅ Prefix/suffix support — Prepend/append to all generated tokens
- ✅ Field-based generation — 1500+ fields across 12+ categories
- ✅ Streaming architecture — Memory-efficient token generation
- ✅ Case transforms — uppercase, lowercase, capitalize, toggle_case, title_case
- ✅ Leet speak — basic, full, random leet variations
- ✅ Homoglyphs — single, random, full expansion
- ✅ Keyboard shifts — adjacent key substitutions
- ✅ Diacritics — expand/strip unicode marks
- ✅ Emoji injection — insertion and random placement
- ✅ Append numbers — suffix with configurable digit patterns
- ✅ String reversal — reverse entire tokens
- ✅ Pluralization — English pluralization rules
- ✅ Length validation — Min/max character constraints
- ✅ Charset filtering — Allowlist/blocklist character validation
- ✅ Entropy calculation — Shannon entropy scoring
- ✅ Quality scoring — 0.0-1.0 quality rating system
- ✅ Pronounceability — Basic pronunciation quality checks
- ✅ Text output — Plain UTF-8 TXT format
- ✅ Compression formats — GZIP, BZIP2, LZ4, ZSTD
- ✅ JSON output — JSONL (one JSON per line)
- ✅ CSV export — Comma-separated values with headers
- ✅ Per-chunk checksums — BLAKE2b integrity verification
- ✅ CLI interface — Full command-line argument support
- ✅ TUI dashboard — Beautiful Ratatui-based interactive interface (Experimental)
- ✅ Help system — Built-in
--helpfor all commands - ✅ Preview mode — Sample generation before full job
- pentest_default — Standard pentesting wordlist
- meme_humor_pack — Creative wordlist with humor
- api_dev_wordlist — API endpoint patterns
- social_media_usernames — Social media handles
- pattern_basic — Crunch-style pattern examples
# Generate all 3-character combinations from 'abc'
python3 omni.py run --min 3 --max 3 --charset "abc" -o output.txt
# Output: aaa, aab, aac, aba, abb, ... bca, bcb, bcc
# View first 10 lines
head -10 output.txt# Generate with leet speak
python3 omni.py run \
--min 5 \
--max 10 \
--charset "abcdefghijklmnopqrstuvwxyz0123456789" \
--prefix "admin_" \
--suffix "!2024" \
-o output.txt# List available presets
python3 omni.py list-presets
# Preview pentest preset (show 50 samples)
python3 omni.py preview --preset pentest_default --sample-size 50
# Generate full wordlist
python3 omni.py run --preset pentest_default -o pentest.txt# Generate with GZIP compression
python3 omni.py run \
--charset "abcdefghijklmnopqrstuvwxyz0123456789" \
--min 6 \
--max 12 \
--compress gzip \
-o wordlist.txt.gz
# Generate with ZSTD (faster compression)
python3 omni.py run \
--charset "abcdefghijklmnopqrstuvwxyz0123456789" \
--min 6 \
--max 12 \
--compress zstd \
-o wordlist.txt.zst
# Decompress when needed
gunzip wordlist.txt.gz
zstd -d wordlist.txt.zst# Generate as JSONL (one JSON per line)
python3 omni.py run \
--charset "abc123" \
--min 4 \
--max 6 \
--format jsonl \
-o output.jsonl
# View the output
cat output.jsonl | head -5
# Each line is: {"token":"abc1","entropy":2.3,"length":4}# List all field categories
python3 omni.py fields --categories
# List fields in a specific category
python3 omni.py fields --category personal
# Generate from specific fields (if implemented)
python3 omni.py run \
--fields first_name_male_0,last_name_0 \
-o names.txtOmniWordlistPro/
├── omni.py # Single-file Python script (~2000 lines)
├── omniwordlist/ # Original modular Python code
│ ├── __init__.py
│ ├── cli.py # CLI entry point & command handling
│ ├── error.py # Error types & handling
│ ├── config.py # Configuration validation
│ ├── charset.py # Character sets & patterns
│ ├── fields.py # 1500+ field taxonomy
│ ├── generator.py # Core streaming generation engine
│ ├── transforms.py # 100+ transform types
│ ├── filters.py # Quality & validation filters
│ ├── storage.py # Output writing & compression
│ └── presets.py # Preset management
│
├── Documentation/ # All documentation files
│ ├── FEATURES.md # Feature list & status
│ ├── INSTALL.md # Installation guide
│ ├── QUICK_START.md # Command reference
│ └── ...
│
├── requirements.txt # Python dependencies
├── setup.py # Python package setup
├── README.md # This file
└── tests/ # Test suite
- Single file containing all functionality
- No build process required
- Optional dependencies (click, rich) for better UX
- Can be used directly:
python3 omni.py
- Predefined charsets: lowercase, uppercase, digits, symbols
- Pattern expansion for Crunch compatibility
- Character set merging and operations
- 1500+ available fields across categories
- Field metadata (type, examples, cardinality)
- Field dependency tracking
- Generates combinations of characters/fields
- Memory-efficient iterator-based approach
- Support for custom ordering and sampling
- 100+ available transforms
- Chain transforms together
- Each transform is deterministic
- Length constraints
- Entropy calculations
- Character set validation
- Pronounceability scoring
- Multiple output formats: TXT, JSONL, CSV
- Compression: GZIP, BZIP2, LZ4, ZSTD
- Streaming writers (no full buffering)
- Load/save preset configurations
- Built-in presets for common use cases
- Preset validation & merging
python3 omni.py run [OPTIONS]Key options:
--min <LEN>— Minimum word length (default: 1)--max <LEN>— Maximum word length (default: 10)--charset <CHARS>— Character set to use (default: lowercase)--prefix <STR>— Prepend to each token--suffix <STR>— Append to each token--preset <NAME>— Use a named preset--compress <FORMAT>— Compress output (gzip, bzip2, lz4, zstd)--format <FMT>— Output format (txt, jsonl, csv)-o, --output <FILE>— Output file path-s, --sample-size <N>— Limit output to N tokens
Example:
python3 omni.py run --min 5 --max 10 --charset "abcdefghijklmnopqrstuvwxyz0123456789" -o wordlist.txtpython3 omni.py preview [OPTIONS]Options:
--preset <NAME>— Preview a preset--sample-size <N>— Number of samples to show (default: 10)--min <LEN>,--max <LEN>— Length constraints
Example:
python3 omni.py preview --preset pentest_default --sample-size 50python3 omni.py list-presetsOutput:
Available Presets:
1. pentest_default - Standard pentesting wordlist
2. meme_humor_pack - Creative with humor
3. api_dev_wordlist - API endpoint patterns
4. social_media_usernames - Social handles
5. pattern_basic - Crunch-style patterns
python3 omni.py show-preset <PRESET_NAME>Example:
python3 omni.py show-preset pentest_defaultpython3 omni.py fields [OPTIONS]Options:
--categories— List all field categories--category <NAME>— List fields in a category--search <QUERY>— Search for fields
Example:
python3 omni.py fields --categories
python3 omni.py fields --category personalpython3 omni.py infoOutput shows:
- Script version
- Supported transforms
- Supported compression formats
- System information
{
"min_length": 8,
"max_length": 16,
"charset": "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789",
"pattern": null,
"enabled_fields": [
"first_name_male_0",
"last_name_0",
"common_suffix_0"
],
"transforms": [
"leet_basic",
"capitalize",
"append_numbers_4"
],
"filters": {
"min_len": 8,
"max_len": 32,
"charset_filter": "abcdefghijklmnopqrstuvwxyz0123456789"
},
"output_file": "./output/wordlist.txt.gz",
"compression": "gzip",
"dedupe": true,
"bloom_fp_rate": 0.01,
"prefix": "admin",
"suffix": "!2024",
"workers": 8
}min_length = 8
max_length = 16
charset = "abc123"
pattern = "pass%%"
output_file = "./output/wordlist.txt"
compression = "gzip"
dedupe = true
enabled_fields = [
"first_name_male_0",
"last_name_0"
]
transforms = [
"capitalize",
"append_numbers_4"
]
[filters]
min_len = 8
max_len = 32
charset_filter = "abcdefghijklmnopqrstuvwxyz0123456789"Performance varies based on:
- Character set size — Larger charsets = more combinations
- Word length — Longer words = exponentially more combinations
- Transforms applied — More transforms = slower output
- System hardware — CPU speed affects iteration speed
Typical speeds on modern hardware:
| Scenario | Charset | Length | Tokens | Characteristics |
|---|---|---|---|---|
| Simple | a-z | 3-5 | 237K | Fast generation |
| Mixed case + digits | A-Za-z0-9 | 6-8 | 2.5M | Moderate speed |
| With transforms | A-Za-z0-9 | 6-8 | 500K | Transform overhead |
Memory usage:
- Base script: ~5-10 MB
- Typical generation: 20-100 MB (iterator-based, minimal buffering)
- With deduplication: Grows with unique token count
- With compression: Depends on compression format
For contributors and developers interested in extending OmniWordlist Pro, see DEVELOPMENT.md for:
- Building from source
- Running tests
- Contributing guidelines
- Architecture deep-dive
- Extending with custom transforms
Solution: Install the optional dependencies:
pip install click richSolution: Python is interpreted, so it's slower than compiled languages. For massive wordlists, consider:
# Use PyPy for better performance
pypy3 omni.py run --min 3 --max 5 --charset abc -o output.txtSolution: Check directory permissions:
# Make sure output directory exists and is writable
mkdir -p ~/wordlists
chmod 755 ~/wordlists
# Run command with that directory
python3 omni.py run --min 3 --max 5 --charset abc -o ~/wordlists/output.txtSolution: Check if the generation actually ran:
# Preview first
python3 omni.py preview --sample-size 10
# Check what you're generating
python3 omni.py run --min 1 --max 2 --charset "ab" -o test.txt
cat test.txt # Should show: a, b, aa, ab, ba, bbSolution: Press Ctrl+C to stop generation.
Contributions are welcome! To get started:
- Fork the repository on GitHub
- Clone your fork locally
- Create a feature branch (
git checkout -b feature/my-feature) - Make your changes and add tests
- Run tests (
python3 -m pytest tests/) - Commit and push to your fork
- Submit a pull request with clear description
- New transforms (more leet variations, emoji sets, etc.)
- Additional field packs (specific industries, languages)
- Performance optimizations
- Documentation improvements
- Bug fixes
- Testing edge cases
See DEVELOPMENT.md for detailed contribution guidelines.
MIT License — See LICENSE file for details
- Documentation: See README.md, INSTALL.md, QUICK_START.md
- Issues: Report bugs on GitHub Issues
- Discussions: Join discussions on GitHub Discussions
- README.md ← You are here
- INSTALL.md — Installation & troubleshooting
- QUICK_START.md — Command quick reference
- DEVELOPMENT.md — For developers & contributors
- FEATURES.md — Feature list & implementation status
- Repository: https://github.com/AaryanBansal-dev/OmniWordlistPro
- Releases: https://github.com/AaryanBansal-dev/OmniWordlistPro/releases
- Issues: https://github.com/AaryanBansal-dev/OmniWordlistPro/issues
Mission: Build the single most flexible wordlist generator — capable of producing highly targeted lists for pentesting, research, and creative experiments — by letting users toggle every conceivable field (personal, cultural, technical, humor, memes, music, language, keyboard patterns, encodings, etc.) and combine them with robust transforms and constraints.
Target users:
- Penetration testers / red teams
- Bug bounty hunters
- Security researchers / DFIR teams
- DevOps/security engineers (credential audit)
- Power users / hobbyists (creative wordlists, memes)
Core differentiators:
- Field-driven architecture with granular toggles (≥1,500 fields)
- Crunch-like charset & template support + CUPP personalization
- Stream-first generation (no OOM) with resumable checkpoints & dedupe
- Enterprise features: RBAC, audit logs, encryption, quotas, marketplace
- Field: A discrete source token (e.g.,
first_name_male,fav_meme_format) that can be enabled/disabled. Fields have metadata (type, examples, cardinality estimate, dependencies). - Preset/Profile: Saved selection of fields, transforms, and generation settings.
- Transform Pipeline: Ordered modifications applied to tokens (leet, homoglyph, transliteration, etc.).
- Combinator Engine: Produces combinations/permutations across enabled fields following rules (cartesian, sequences, weighted sampling).
- Sink/Output: Where generated tokens are written (file, S3, API, STDOUT).
- Checkpoint: Persistent state to resume long jobs deterministically.
- Rule DSL: Small domain-specific language to express templates, conditional rules, and constraints.
This combines the Core Features previously described and the 120+ advanced features. Key groups:
- Field grouping, adaptive combinator caps, weighted & conditional combinator rules, stochastic generation, reservoir sampling, sequence templates, cross-field exclusion, hierarchical profiles, dynamic rule scoring, constraint solver.
- Multi‑pass transforms, context-aware leet, homoglyph injection, keyboard-shift transforms, phonetic/IPA transforms, diacritic expansion/stripping, transliteration (multi-script), emoji injection rules, macro transforms, entropy-guided mutation, Levenshtein fuzzing, pluralization, dialect variations.
- Probabilistic profanity filter, entropy & pronounceability filters, language detection & family filters, visual-similarity, regex sandbox, charset intersection, stopword & frequency filters, token-length histogram enforcement.
- Chunked gzipped outputs, S3/MinIO multipart streaming, compressed formats (gz/bz2/zstd/7z), signed/encrypted artifacts, content-addressed storage, per-chunk checksums, metadata manifests, diffs between runs, TTL/retention, partial downloads, watermarking, format conversion API.
- Distributed generation with sharded partitions, autoscale worker pools, GPU-accelerated transforms (ML filters), streaming backpressure, sharded Bloom filters, RocksDB/LMDB external backing, adaptive compression, IO benchmarking suite, parallel transform pipelines.
- Multi-level checkpointing (local/remote), deterministic resumes, job snapshotting, checkpoint compaction, canary jobs, restart/backoff strategies, corruption detection & repair, cross-region failover, idempotent job submission, job priority queues.
- Visual rule builder, live preview pane, rule DSL with Monaco integration, preset versioning, inline docs & examples, field dependency graph, keyboard-first UI, dark/light theming.
- Hashcat/John exports, direct SCP to pentest VMs, SIEM connectors, Slack/Discord notifications, webhooks, Git integration for presets, GitHub Actions, VSCode extension, browser capture extension, Zapier/Make connectors, REST + GraphQL APIs.
- GDPR erasure, PII scanner, compliance presets, immutable audit logs, RBAC, field-level encryption, per-job ACLs, MFA for admin ops, signed manifests, consent flows for personal fields, export redaction templates, plugin sandbox (WASM).
- Pay-per-job billing, premium rulepacks marketplace, job credits, team seat pricing, usage reports, promo code system, affiliate tracking, demo generator for marketing.
- ML-based seed suggestions, auto-tune transforms using historical hit rates, LLM-driven predicate generation, auto-summarize outputs, semantic dedupe (vector-based), anomaly detection, auto-documentation generation.
- Synthetic corpora generator, preflight dry-runs, regression harness, fuzz testing, rule coverage reports, mutation tests, deterministic test seeds, integration tests with Hashcat/John.
- Shared preset libraries, commenting & review workflows, co-editing, delegated approvals, template ratings, issue tracker integration, plugin marketplace, SDKs (Python, JS, Go), web integration options.
(Full feature flags list is included in repo FEATURES.md — see sample JSON at the end of this doc.)
We’ll provide the field taxonomy in three layers:
- Category — High level (Personal, Language, Humor, Music, Internet/Tech, Numbers, Keyboard, Fantasy, Science, Regional, Style, etc.)
- Field Group — Mid-level (Names, Dates, MemeFormats, SongSnippets, IPPatterns, KeyboardWalks, Homoglyphs, etc.)
- Field — Granular toggle (e.g.,
first_name_male,birth_month_name,fav_meme_format,keyboard_walk_qwerty_diag,emoji_single,leet_full).
{
"id": "first_name_male",
"category": "personal",
"group": "names",
"type": "string",
"examples": ["Aaryan","Arjun"],
"cardinality_estimate": 10000,
"sensitivity": "low",
"dependencies": [],
"conflicts": [],
"ui_hint": "text,autocomplete",
"default_enabled": true
}- Start with the 500+ base fields already defined (common templates).
- Add 70 personality fields (memes/humor) and 120 advanced features toggles as feature flags.
- Expand language packs (per language add 50+ fields: stopwords, diacritics, transliteration rules).
- Add regional variants (per-country city lists, postal codes, area codes).
- Add domain-specific packs (finance, healthcare—non-sensitive placeholders).
- Provide extensible field loader to import CSV/JSON packs so community/corporate customers can drop in their own 1000+ values.
- Frontend (Electron/React) — UI, Visual Rule Builder, Live Preview, Preset management.
- API Server (FastAPI / Async Python) — Job submission, config validation, preset management, auth.
- Job Scheduler — Celery/RabbitMQ or Redis Queue; spawns workers (Kubernetes Jobs) for heavy generation.
- Generator Worker — Core combinator engine (Python), streaming transforms, dedupe (Bloom/RocksDB), checkpointing.
- Storage — S3/MinIO for outputs, Postgres for metadata, Redis for caches/queues, RocksDB for on-disk dedupe.
- Auth & Billing — OAuth2, NextAuth + Stripe integration.
- Observability — Prometheus + Grafana + Loki/ELK.
[User] -> [GUI/CLI] -> [API Server] -> [Config Validator] -> [Scheduler]
-> [Worker Pool] -> [Transform Engine] -> [Dedupe/Filter] -> [Sink (S3/File)]
-> [Metadata / Checkpoints (Postgres/SQLite)] -> [Notifications/Webhooks]
- Config validation: Check field dependencies, estimate cardinality.
- Preflight: If >threshold, run sample/canary job and prompt user.
- Partitioning: Split combinatorial space across worker shards.
- Stream generation: Seed→Combinator→Transforms→Filters→Dedupe→Write chunk.
- Checkpoint: Persist last emitted hash per shard.
- Finalize: Assemble chunk manifests and upload metadata.
- Use stable hashing (BLAKE2b) of tokens and deterministic partitioning.
- Checkpoints store
(job_id, shard_id, last_hash, offset, bloom_state)to resume exactly.
POST /api/v1/jobs— Submit a generation job (body: preset/config).GET /api/v1/jobs/{job_id}— Job status & metadata.GET /api/v1/jobs/{job_id}/chunks— List output chunk manifests.POST /api/v1/presets— Save a preset.GET /api/v1/presets/{id}— Fetch preset.POST /api/v1/presets/{id}/validate— Validate preset & estimate size.POST /api/v1/presets/{id}/sample— Produce a sampled preview (N results).GET /api/v1/features— List feature flags.POST /api/v1/auth/login— OAuth token exchange.
{
"preset_id": "pentest_default_v1",
"fields": ["company_name","dev_handles","first_name_male","birth_year"],
"transforms": ["leet_basic","reverse","append_numbers_4"],
"filters": {"min_len":8,"max_len":32,"charset":"ascii"},
"output": {"type":"s3","bucket":"omni-results","path":"jobs/1234/"},
"schedule": {"priority":"high"},
"callbacks": {"webhook":"https://hooks.example/job-cb"}
}omni run --config preset.json --out ./out.gz --resumeomni preview --preset pentest_default_v1 --sample 1000omni export-rule --preset x --format hashcat.rule
- Dashboard — Recent jobs, presets, usage metrics, billing.
- Preset Editor (Visual Rule Builder) — Left: field tree with toggles; center: drag/drop rule canvas; right: field examples & dependency hints. Supports Monaco rule editor for DSL.
- Live Preview — Stream first N results, show stats (entropy hist, char distribution).
- Job Submit — Options for output, retention, encryption, and approvals.
- Job Monitor — Per-job progress, per-shard throughput, logs, resume/cancel.
- Marketplace — Browse/purchase rulepacks & templates.
- Admin — RBAC, audit logs, billing, quotas.
- Progressive disclosure: Most fields collapsed; power users expand groups.
- Warnings: Cardinality estimates displayed with visual risk (green/yellow/red).
- Presets: Shareable, versioned, forkable.
- One-click canary: Run a 1k sample to sanity-check before full job.
- Primary outputs: newline-separated UTF-8 TXT / gzipped TXT, JSONL (structured tokens), CSV (token + metadata columns), Parquet for analytics.
- Hashcat/John integration: export flags and rule files; compatible encodings.
- S3 Hooks: multipart, progress, per-chunk checksum.
- SIEM: Push job metadata & summary into Splunk/ELK.
- Notification: Slack/Discord webhooks; optional manual approval flows.
- Kubernetes-based worker autoscaling.
- Partition strategy: split combinatorial axes (e.g., field groups) into shards.
- Dedupe: sharded Bloom filters persisted to RocksDB for memory efficiency.
- Hot presets cached in Redis for fast sampling.
- Checkpoints every N tokens or per-chunk boundary.
- Job snapshotting ensures deterministic replay.
- Canary jobs before full runs; auto-cancel on policy violations.
- TLS everywhere, OAuth2 + scopes, per-tenant KMS (Key Management Service) integration.
- Field-level encryption for sensitive fields.
- Audit logs immutable via append-only storage.
- Plugin runtime sandbox using WASM + syscall filter.
- RBAC enforced on API & GUI.
- GDPR: per-user data erasure endpoint; export redaction templates.
- PII: PII scanner warns when fields likely to contain PII and requires explicit confirmation.
- Terms of Service & Acceptable Use: users must confirm legal usage (testing only on authorized targets). Company-level legal team to craft TOS and export controls.
- Marketplace vetting: review paid rulepacks to avoid malicious content; sign & certify packs.
- Unit tests: transforms, combinator engine, filters.
- Integration tests: end-to-end small job generation, resume flow.
- Performance tests: IO throughput, Bloom filter scaling, distributed generation.
- Fuzz tests: malformed DSL, regex edge cases (catastrophic backtracking).
- Security tests: plugin sandbox escape attempts, RBAC bypass, input validation.
- CI pipeline: run deterministic seeds, assert sample outputs.
- Canary & Staging: small production-like cluster for heavy jobs.
- Repo bootstrap, basic README, field schema v0 (500 fields), basic CLI.
- Implement streaming single-node generator (clean approach).
- Implement transforms: leet, reverse, padding, simple homoglyphs.
- SQLite checkpointing, gz sink, basic Bloom dedupe.
- CLI commands: run, preview, resume.
- Docs & sample presets (pentest_default).
- API server (FastAPI), job scheduler (Celery).
- Electron GUI with Visual Rule Builder (preview pane).
- Cardinality estimator, canary sample mode, preflight checks.
- S3 sink, chunking, per-chunk checksums.
- RBAC, audit logs, KMS integration, GDPR endpoints.
- Hashcat/John exports, VSCode extension, GitHub Actions orb.
- Marketplace MVP (upload/publish rulepacks).
- End of Week 4: CLI + streaming generator + sample presets.
- End of Week 8: GUI Beta + API server + S3 output.
- End of Week 12: Enterprise features + marketplace MVP.
/omniwordlist
├─ /docs
├─ /backend
│ ├─ api/ # FastAPI app
│ ├─ jobs/ # Job scheduler + worker code
│ ├─ generator/ # combinator engine, transforms
│ ├─ storage/ # S3 / local storage adapters
│ └─ tests/
├─ /cli
│ └─ omni (entrypoint)
├─ /frontend
│ ├─ electron-app
│ └─ web/
├─ /plugins
├─ /presets
├─ /field-packs
├─ /infrastructure
│ ├─ k8s/
│ └─ terraform/
└─ docker-compose.yml
generator/engine.py— pipeline orchestration (seed→transform→filter→sink)generator/transforms/*— specific transforms (leet, homoglyph, phonetic)generator/dedupe/*— Bloom/RocksDB adaptersapi/jobs.py— job submission & validationcli/omni— CLI wrapper to call API or run local enginefrontend/visual-builder— React components & Monaco integration
{
"name": "pentest_default_v1",
"fields": ["company_name", "dev_handles", "first_name_male", "last_name", "birth_year", "common_suffixes"],
"transforms": ["leet_basic","append_numbers_best_4","titlecase_names"],
"filters": {"min_len":8,"max_len":30,"charset":"ascii"},
"output": {"type":"file","path":"./out/pentest_default_v1.gz"}
}{
"name": "meme_humor_pack",
"fields": ["fav_meme_format","favorite_joke","favorite_pun","go_to_reaction_emoji"],
"transforms": ["emoji_insertion","random_case"],
"filters": {"min_len":3,"max_len":140},
"output": {"type":"jsonl","path":"./out/meme_pack.jsonl"}
}- Pricing Tiers: Free (limited lines/day), Pro ($49/mo), Team ($199/mo), Enterprise (custom).
- Monetizable features: Pay-per-job heavy exports, premium transforms (ML phonetics), marketplace rulepacks, priority SLA.
- Marketing: OSS launch of CLI core on GitHub (MIT), blog posts (Hacker News, r/netsec), demos with bug-bounty teams, webinars.
- Support & Onboarding: Docs + video tutorials, community Discord, paid enterprise onboarding.
- SLA & Ops: K8s cluster with autoscaling, backups (S3 lifecycle), incident runbooks.
- Create GitHub repo
omniwordlistwithREADME.md,CONTRIBUTING.md, and license. - Implement
generator/engine.pystreaming pipeline (use clean approach code pattern). - Implement transforms:
leet_basic,reverse,append_numbers_best_4,emoji_insertion. - Add SQLite checkpointing & Bloom dedupe adapter with RocksDB fallback.
- Create
presets/pentest_default.jsonandpresets/meme_humor_pack.json. - Build CLI
omni run --configandomni preview --sample. - Draft
FEATURES.mdlisting all 120+ advanced feature flags with short descriptions. - Build cardinality estimator & preflight sample runner (1k lines).
- Create issue templates and a project board for Phase 1 tasks.
- Prepare release v0.1 (CLI only) and an initial demo video.
{
"features": {
"adaptive_combinator_cap": true,
"weighted_combinator_generation": true,
"homoglyph_injection_engine": true,
"signed_encrypted_artifacts": false,
"distributed_generation": false,
"visual_rule_builder": false,
"gdpr_data_erasure": false,
"ml_seed_suggestion": false,
"plugin_system_wasm": false
}
}- User uploads/creates preset.
- User requests
preview→ system runs sample job (1k). - User approves full job → system partitions job, spins up workers.
- Workers stream tokens to S3 with per-chunk checksums.
- On completion, job metadata + manifest created; user notified.
Users must confirm they have authorization to test any systems, accounts, or targets against which generated wordlists are used. OmniWordlist Pro is a defensive tool. Usage against unauthorized targets is prohibited.
I can now:
- scaffold the repo with the CLI MVP, streaming generator, checkpointing, and sample presets; or
- produce the full
FEATURES.md(120+ features) andFIELDS_SCHEMA.json(1,500+ field entries ready to import); or - create the Electron visual rule builder mockups + Monaco DSL examples.
Which of these should I generate right now and produce as files you can download and run?