⚡ Scrape and process millions of files in milliseconds with zero latency.
FastFileScrape is the high‑speed file scraping module of the FastJava ecosystem.
It provides two core capabilities:
- FastFileTree — build complete directory trees with include/exclude rules
- FastFileScrapeContent — extract file contents with chunking for LLMs and agents
- Recursive directory walking
- Include/Exclude glob filters
- Sorted output (folders → files)
- JSON or ASCII tree output
- Git‑ignore aware (optional)
- Extracts file contents with UTF‑8 safety
- Chunking by byte size or newline boundaries
- Include/Exclude patterns
- JSONL or plain text output
- Ideal for LLM context ingestion
tree→ structure onlycontent→ file contents onlyall→ both combined- Output to stdout or file
- JSONL mode for AI pipelines
# Show directory tree
fastfilescrape tree --root . --include "**/*.java"
# Extract file contents
fastfilescrape content --root . --include "**/*.java" --out repo.txt
# Tree + Content in JSONL
fastfilescrape all --root . --include "**/*.java" --format jsonl --out repo.jsonl
import fastfilescrape.*;
public class Demo {
public static void main(String[] args) throws Exception {
// Tree
var tcfg = new FastFileTree.Config();
tcfg.root = Path.of(".");
var tree = FastFileTree.build(tcfg);
FastFileTree.printTree(tree, System.out);
// Content
var ccfg = new FastFileScrapeContent.Config();
ccfg.root = Path.of(".");
ccfg.includeGlobs = List.of("**/*.java");
FastFileScrapeContent.scrape(ccfg, (file, chunk, text) -> {
System.out.println("=== " + file + " (chunk " + chunk + ") ===");
System.out.println(text);
});
}
}Add the JitPack repository and the dependencies to your pom.xml:
<repositories>
<repository>
<id>jitpack.io</id>
<url>https://jitpack.io</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>com.github.andrestubbe</groupId>
<artifactId>FastFileScrape</artifactId>
<version>v0.1.0</version>
</dependency>
<dependency>
<groupId>com.github.andrestubbe</groupId>
<artifactId>FastGLOB</artifactId>
<version>v0.1.0</version>
</dependency>
<dependency>
<groupId>com.github.andrestubbe</groupId>
<artifactId>FastCore</artifactId>
<version>v1.0.0</version>
</dependency>
</dependencies>repositories {
maven { url 'https://jitpack.io' }
}
dependencies {
implementation 'com.github.andrestubbe:FastFileScrape:v0.1.0'
implementation 'com.github.andrestubbe:FastGLOB:v0.1.0'
implementation 'com.github.andrestubbe:FastCore:v1.0.0'
}Download the pre-compiled JARs to add them to your classpath:
- 📦 FastFileScrape-v0.1.0.jar (The Scraper Core Library)
- 📦 FastGlob-v0.1.0.jar (The Native Glob Matching Library)
- ⚙️ fastcore-v1.0.0.jar (The Mandatory JNI Loader)
Important
Since FastFileScrape is natively accelerated, all three JARs must be present in your classpath for the JNI-accelerated directory walking to operate correctly on Windows.
| Method | Description |
|---|---|
Node build(Config cfg) |
Builds the directory tree |
printTree(Node, Appendable) |
Prints ASCII tree |
| Method | Description |
|---|---|
scrape(Config cfg, Sink sink) |
Reads files and emits chunks |
- COMPILE.md: Full compilation guide (MSVC C++17 build chain + JNI Setup).
- REFERENCE.md: Full API descriptions, border configurations, and codepoint index.
- PHILOSOPHIE.md: The engineering rationale for zero-allocation performance.
- ROADMAP.md: Future milestones and planned features.
| Platform | Status |
|---|---|
| Windows 10/11 | ✅ Fully Supported |
| Linux | 🚧 Planned |
| macOS | 🚧 Planned |
MIT License — See LICENSE file for details.
- FastFileIndex — Ultra-fast filesystem scanner
- FastFileContentIndex — High-speed in-file text indexing
- FastFileWatch — High-performance directory watch service using USN Journal
- FastFileSearch — Ultra-fast indexed file prefix trie search
- FastGLOB — Ultra-fast native Win32 glob matching and traversal
- FastFileSystem — Unified filesystem operations (Index, Search, Watch, Scrape) in one API
Part of the FastJava Ecosystem — Making the JVM faster. Small package. Maximum speed. Zero bloat. 🚀📋
