filescan is a lightweight Python tool for scanning filesystem structures and Python ASTs and exporting them as flat, graph-style representations.
Instead of nested trees, filescan produces stable lists of nodes with parent pointers, making the output:
- easy to post-process
- friendly for CSV / DataFrame / SQL pipelines
- efficient for LLM ingestion and summarization
filescan can operate at two levels:
- filesystem structure (directories & files)
- Python semantic structure (modules, classes, functions, methods)
Both use the same flat graph design and export formats.
- Recursive directory traversal
- Flat node list with explicit
parent_id - Deterministic ordering
- Optional
.gitignore-style filtering - CSV and JSON export
- Module, class, function, and method detection
- Nested functions and classes supported
- Stable symbol IDs with parent relationships
- Best-effort function signature extraction
- First-line docstring capture
- Shared schema + export model
- Same API for filesystem and AST scanners
- Usable as both a library and a CLI
- Designed for automation, data pipelines, and AI workflows
pip install filescanOr for development:
pip install -e .Scan the current directory and write a CSV:
filescanScan a specific directory:
filescan ./dataExport as JSON:
filescan ./data --format jsonSpecify output base path:
filescan ./data -o out/treeThis generates:
out/
├── tree.csv
└── tree.json
Scan Python source files and extract symbols:
filescan ./src --astExport AST symbols as JSON:
filescan ./src --ast --format jsonCustom output path:
filescan ./src --ast -o out/symbolsThis generates:
out/
├── symbols.csv
└── symbols.json
filescan supports gitignore-style patterns via pathspec.
- If
--ignore-fileis provided → use it - Otherwise, look for:
./.fscanignore (current working directory)
Ignore rules apply to:
- filesystem scanning
- AST scanning (Python files are skipped if ignored)
.git/
.idea/
build/
dist/
__pycache__/
*.pycBoth filesystem and AST scans produce flat graphs with schema metadata.
| Field | Description |
|---|---|
id |
Unique integer ID |
parent_id |
Parent node ID (null for root) |
type |
'd' = directory, 'f' = file |
name |
Base name |
size |
File size in bytes (null for directories) |
# id: Unique integer ID for this node
# parent_id: ID of parent node, or null for root
# type: Node type: 'd' = directory, 'f' = file
# name: Base name of the file or directory
# size: File size in bytes; null for directories
id,parent_id,type,name,size
0,,d,data,
1,0,f,example.txt,128| Field | Description |
| - | |
| id | Unique integer ID for this symbol |
| parent_id | Parent symbol ID (null for module) |
| kind | module | class | function | method |
| name | Symbol name |
| module_path | File path relative to scan root |
| lineno | Starting line number (1-based) |
| signature | Function or method signature (best-effort) |
| doc | First line of docstring, if any |
Nested functions and classes are represented naturally via parent_id.
from filescan import Scanner
scanner = Scanner(
root="data",
ignore_file=".fscanignore",
)
scanner.scan()
scanner.to_csv() # -> ./data.csv
scanner.to_json() # -> ./data.jsonfrom filescan import AstScanner
scanner = AstScanner(
root="src",
ignore_file=".fscanignore",
output="out/symbols",
)
scanner.scan()
scanner.to_csv()
scanner.to_json()nodes = scanner.scan()
print(len(nodes))
data = scanner.to_dict()Most filesystem and code structures are represented as deeply nested trees. While human-readable, they are verbose, hard to query, and inefficient for large-scale processing.
filescan represents both filesystems and codebases as flat graphs because this format is:
-
Compact and token-efficient Flat lists with numeric IDs consume far fewer tokens than recursive trees, making them ideal for LLM context windows.
-
Explicit and unambiguous All relationships are encoded directly via
parent_id. -
Easy to process Flat data works naturally with filtering, joins, grouping, and graph analysis.
This makes filescan especially suitable for:
- SQL / Pandas / DuckDB pipelines
- Static analysis and refactoring tools
- Graph-based code understanding
- LLM-based reasoning and summarization of projects
In short, filescan favors machine-friendly structure over visual trees, enabling scalable, AI-native workflows.
The project uses a src/ layout.
Examples can be run without installation:
python examples/scan_data.pyOr as a module:
python -m examples.scan_dataMIT License