filescan

filescan is a lightweight Python tool for scanning filesystem structures and Python ASTs and exporting them as flat, graph-style representations.

Instead of nested trees, filescan produces stable lists of nodes with parent pointers, making the output:

easy to post-process
friendly for CSV / DataFrame / SQL pipelines
efficient for LLM ingestion and summarization

filescan can operate at two levels:

filesystem structure (directories & files)
Python semantic structure (modules, classes, functions, methods)

Both use the same flat graph design and export formats.

Features

Filesystem scanning

Recursive directory traversal
Flat node list with explicit parent_id
Deterministic ordering
Optional .gitignore-style filtering
CSV and JSON export

Python AST scanning

Module, class, function, and method detection
Nested functions and classes supported
Stable symbol IDs with parent relationships
Best-effort function signature extraction
First-line docstring capture

General

Shared schema + export model
Same API for filesystem and AST scanners
Usable as both a library and a CLI
Designed for automation, data pipelines, and AI workflows

Installation

pip install filescan

Or for development:

pip install -e .

Quick start (CLI)

Filesystem scan (default)

Scan the current directory and write a CSV:

filescan

Scan a specific directory:

filescan ./data

Export as JSON:

filescan ./data --format json

Specify output base path:

filescan ./data -o out/tree

This generates:

out/
├── tree.csv
└── tree.json

Python AST scan

Scan Python source files and extract symbols:

filescan ./src --ast

Export AST symbols as JSON:

filescan ./src --ast --format json

Custom output path:

filescan ./src --ast -o out/symbols

This generates:

out/
├── symbols.csv
└── symbols.json

Ignore rules (`.fscanignore`)

filescan supports gitignore-style patterns via pathspec.

Default behavior

If --ignore-file is provided → use it
Otherwise, look for:

./.fscanignore   (current working directory)

Ignore rules apply to:

filesystem scanning
AST scanning (Python files are skipped if ignored)

Example `.fscanignore`

.git/
.idea/
build/
dist/
__pycache__/
*.pyc

Output formats

Both filesystem and AST scans produce flat graphs with schema metadata.

Filesystem schema

Field	Description
`id`	Unique integer ID
`parent_id`	Parent node ID (`null` for root)
`type`	`'d'` = directory, `'f'` = file
`name`	Base name
`size`	File size in bytes (`null` for directories)

CSV example

# id: Unique integer ID for this node
# parent_id: ID of parent node, or null for root
# type: Node type: 'd' = directory, 'f' = file
# name: Base name of the file or directory
# size: File size in bytes; null for directories
id,parent_id,type,name,size
0,,d,data,
1,0,f,example.txt,128

Python AST schema

Nested functions and classes are represented naturally via parent_id.

Library usage

Filesystem scanner

from filescan import Scanner

scanner = Scanner(
    root="data",
    ignore_file=".fscanignore",
)

scanner.scan()
scanner.to_csv()    # -> ./data.csv
scanner.to_json()   # -> ./data.json

Python AST scanner

from filescan import AstScanner

scanner = AstScanner(
    root="src",
    ignore_file=".fscanignore",
    output="out/symbols",
)

scanner.scan()
scanner.to_csv()
scanner.to_json()

Programmatic access

nodes = scanner.scan()
print(len(nodes))

data = scanner.to_dict()

Why `filescan`?

Most filesystem and code structures are represented as deeply nested trees. While human-readable, they are verbose, hard to query, and inefficient for large-scale processing.

filescan represents both filesystems and codebases as flat graphs because this format is:

Compact and token-efficient Flat lists with numeric IDs consume far fewer tokens than recursive trees, making them ideal for LLM context windows.
Explicit and unambiguous All relationships are encoded directly via parent_id.
Easy to process Flat data works naturally with filtering, joins, grouping, and graph analysis.

This makes filescan especially suitable for:

SQL / Pandas / DuckDB pipelines
Static analysis and refactoring tools
Graph-based code understanding
LLM-based reasoning and summarization of projects

In short, filescan favors machine-friendly structure over visual trees, enabling scalable, AI-native workflows.

Development

The project uses a src/ layout.

Examples can be run without installation:

python examples/scan_data.py

Or as a module:

python -m examples.scan_data

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
examples		examples
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md
main.sh		main.sh
package.sh		package.sh
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

filescan

Features

Filesystem scanning

Python AST scanning

General

Installation

Quick start (CLI)

Filesystem scan (default)

Python AST scan

Ignore rules (`.fscanignore`)

Default behavior

Example `.fscanignore`

Output formats

Filesystem schema

CSV example

Python AST schema

Library usage

Filesystem scanner

Python AST scanner

Programmatic access

Why `filescan`?

Development

License

About

Uh oh!

Releases 2

Packages

Languages

License

DreamSoul-AI/filescan

Folders and files

Latest commit

History

Repository files navigation

filescan

Features

Filesystem scanning

Python AST scanning

General

Installation

Quick start (CLI)

Filesystem scan (default)

Python AST scan

Ignore rules (.fscanignore)

Default behavior

Example .fscanignore

Output formats

Filesystem schema

CSV example

Python AST schema

Library usage

Filesystem scanner

Python AST scanner

Programmatic access

Why filescan?

Development

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Ignore rules (`.fscanignore`)

Example `.fscanignore`

Why `filescan`?

Packages