The definitive high-performance document engine for the AI era.
Convert PDF, DOCX, and XLSX to clean Markdown & NLP datasets in milliseconds.
bgustdown is an industrial-grade data engineering powerhouse built from the ground up in Rust. It eliminates the performance bottlenecks in AI data pipelines, providing the speed and semantic precision required for production-grade RAG and NLP fine-tuning.
This repository uses a multi-branch distribution strategy to keep the core development separated from production-ready packages.
| Branch | Target Ecosystem | Description |
|---|---|---|
main |
Source / Dev | The raw Rust source code and development manifests. |
npm |
Node.js / NPM | Production-ready NAPI bindings and CLI tool. |
crates |
Rust / Cargo | Pure Rust library distribution for crates.io. |
skill |
AI Agents | Native AI Skill installer and capabilities manifest. |
If you want to contribute or build the engine from source:
git clone https://github.com/B-GUST/bgustdown.git
cd bgustdown
npm install && npm run buildnpx skill add https://github.com/B-GUST/bgustdown- Node.js:
npm install bgustdown - Rust:
cargo add bgustdown
| Command | Action |
|---|---|
npx bgustdown convert <path> |
Universal conversion to Markdown. |
npx bgustdown prepare <path> |
Semantic NLP Dataset Preparation. |
Version: 0.1.2 (Stable) | Official Docs: bgustdown.lat