Python CLI that cleans up source code encoding, line endings, and whitespace across entire codebases -- with parallel processing, SHA256 caching, and pre-commit hook support.
- Location:
C:\Dev\PROJECTS\CODE - Status: v3.0 code complete. Package stub ready. No pyproject.toml = blocked from PyPI.
- Updated: 2026-03-10
Run it against any directory and it will:
- Detect and convert file encoding to UTF-8 (handles utf-8, utf-8-sig, utf-16, utf-16-le, utf-16-be, windows-1252, latin-1, iso-8859-1)
- Fix line endings -- CRLF to LF
- Strip trailing whitespace from every line
- Ensure a single newline at end of file
- Optionally validate syntax for Python, JS, TS, Go, Rust, C, C++, Java
Files already clean are skipped. SHA256 caching means repeat runs on unchanged files are near-instant. Multi-core parallel mode handles large codebases at 80-200 files/sec.
Set-Location C:\Dev\PROJECTS\CODE
# See what would change without touching anything
python main.py C:\path\to\project --dry-run
# Normalize everything in-place using all CPU cores
python main.py C:\path\to\project --parallel --in-place
# Normalize only Python and JavaScript files
python main.py C:\path\to\project -e .py -e .js --in-place
# Review and approve each file before it's written
python main.py C:\path\to\project --interactive
# Run syntax validation after normalizing
python main.py C:\path\to\project --in-place --check
# Install a pre-commit hook into a git repo
cd C:\your-repo
python C:\Dev\PROJECTS\CODE\main.py --install-hookmain.py at root is a thin wrapper that delegates to src/code_normalize_pro.py.
Call either one -- same result.
Checks only staged files before each commit. Blocks commit if any need normalization and prints the fix command.
# One-time install per repo
cd C:\your-repo
python C:\Dev\PROJECTS\CODE\main.py --install-hook
# Commit as normal -- hook fires automatically
git commit -m "your message"
# Skip hook for one commit
git commit --no-verify -m "your message"| Files | Sequential | Parallel 4-core | Speedup |
|---|---|---|---|
| 100 | 3.2s | 1.1s | 2.9x |
| 500 | 16.8s | 4.3s | 3.9x |
| 1000 | 33.5s | 7.1s | 4.7x |
8 cores: 150-200 files/sec. SHA256 cache on unchanged files: 500-1000 files/sec.
Workers default to CPU count. Override with --workers N.
Set-Location C:\Dev\PROJECTS\CODE
.\.venv\Scripts\Activate.ps1
python -m pytest -q
python main.py --helpTest files in tests/ cover the main tool plus all four launch/sales scripts.
All 5 features tested on 2026-02-09 (see docs/TEST_REPORT.md). Manual confirmation
of interactive mode still pending.
CODE/
main.py -- Root entrypoint. Delegates to src/code_normalize_pro.py
src/
code_normalize_pro.py -- v3.0 Pro. 917 lines. The active tool.
code_normalize_v2.py -- v2.0. Kept for reference.
code_normalizer_pro/ -- PyPI package stub
__init__.py -- Exposes __version__ = "3.0.1"
cli.py -- Console entry point (calls src/code_normalize_pro.py)
README.md
config/
settings.py -- Env-var settings loader (not wired up yet)
docs/
README.md -- Full feature reference docs
TEST_REPORT.md -- Test results from 2026-02-09
ARCHITECTURE.md -- Stub
launch/ -- Outreach templates, user tracking CSV, metrics JSON
sales/ -- Pricing, pipeline CSV, customer offer template
release/
alpha_release_checklist.md -- Step-by-step PyPI publish checklist
release_readiness.json -- Says ready=true, wheel+sdist listed
roadmaps/
README.md -- Overview of all 6 paths
01_solo_dev_tool.md -- CHOSEN: bootstrap to PyPI
02_dev_tool_saas.md
03_enterprise_platform.md
04_open_source_support.md
05_grammarly_for_code.md
06_ai_transformation_engine.md
scripts/
launch_metrics.py
feedback_prioritizer.py
sales_pipeline_metrics.py
release_prep.py
tests/
test_code_normalize_pro.py
test_feedback_prioritizer.py
test_launch_metrics.py
test_release_prep.py
test_sales_pipeline_metrics.py
site/
index.html -- Static landing page
styles.css
.github/
workflows/ci.yml -- CI: install, smoke check, pytest, build
ISSUE_TEMPLATE/
pull_request_template.md
files/
cache_sandbox/ -- Test fixtures (a.py, b.py)
smoke_case.py
EXECUTION_PLAN.md -- 7-day launch plan (all tasks pending)
VERIFY.md -- Verification runbook
MISSINGMORE.txt -- Gap tracking
QUICK_REFERENCE.md -- Command cheat sheet
CHANGELOG.md -- Stub (unreleased only)
Core: zero. Python 3.10+ only.
Optional:
tqdm-- progress bars- Syntax checkers (only needed with
--check): Python: built-in (py_compile) | JS: node | TS: tsc | Go: gofmt Rust: rustc | C: gcc | C++: g++ | Java: javac
Dev/test: pytest (see requirements.txt)
Critical -- blocks shipping:
-
No
pyproject.toml-- CI runspython -m buildwhich will fail without it. Thecode_normalizer_pro.egg-info/dir shows packaging was attempted but no config file exists in the tree. Createpyproject.tomlwith src layout and console_scripts entry point before running Day 1 tasks. -
code_normalizer_pro/cli.pyhas a broken import:from code_normalize_pro import mainAfterpip install, Python looks for a module namedcode_normalize_proin site-packages, not insrc/. Without a proper src layout in pyproject.toml, the installed CLI command will fail on launch.
Code bugs worth fixing:
-
Cache default is on in
__init__but--cacheflag implies opt-in and--no-cacheimplies opt-out. The flags and the default contradict each other. Pick one direction and make the help text match. -
--parallel --in-placesilently disables backups.process_file_workerpassescreate_backup=Falsebut backup logic only lives insideprocess_file. Users running parallel mode have no backups. Either warn loudly or fix it. -
walk_and_processandprocess_fileboth incrementtotal_filesfor the same files. Summary stats will show inflated counts. -
.normalize-cache.jsonlands in CWD, not the target directory. Running the tool against three different projects from the same shell session corrupts the cache. Passroot / CACHE_FILEto CacheManager inwalk_and_process. -
--dry-runalways exits 0 even when it finds files needing normalization. CI pipelines need a non-zero exit to catch violations. Add--fail-on-changesor make dry-run exit 1 when changes are detected.
Cleanup:
code_normalize_pro.pyat root -- stale copy. Real file issrc/. Delete it.roadmaps/New Text Document.txt-- empty temp file. Delete it.roadmaps/talking about code.txt-- saved AI chat session. Delete or move to docs/.- All
README_20260220_*.md.bakfiles throughout the tree -- ReadmeForge backups. config/settings.pyis a clean env-var loader but nothing imports it. Either wire it intocode_normalize_pro.pyor remove it.README_PRO.mdat root duplicatesdocs/README.md. Consolidate.restore_report.jsonandsmoke_report.jsonat root -- generated artifacts, add to.gitignore.PROJECT_STATUS.mdsays roadmap docs are "coming soon" -- all 6 exist. Stale.
EXECUTION_PLAN.md has a 7-day checklist. As of 2026-03-10, nothing started.
Before Day 1 tasks will work, pyproject.toml needs to exist (see issue #1 above).
Day 1 after pyproject.toml is in place:
Set-Location C:\Dev\PROJECTS\CODE
.\.venv\Scripts\Activate.ps1
python -m pytest -q
pip install -e .
code-normalizer-pro --helpFull release steps: see docs/release/alpha_release_checklist.md
.github/workflows/ci.yml runs on push to main/master and on PRs:
- Python 3.11
- pip install from requirements.txt
- CLI smoke check (main.py and src/code_normalize_pro.py --help)
- pytest -q
- python -m build (sdist + wheel)
Note: python -m build requires pyproject.toml. CI will fail until that exists.
| Version | Date | Changes |
|---|---|---|
| v3.0 | 2026-02-09 | Parallel processing, SHA256 caching, pre-commit hooks, multi-language syntax, interactive mode |
| v2.0 | 2026-02-09 | Dry-run, in-place editing, backups, tqdm, detailed stats |
| v1.0 | -- | Basic encoding fix, CRLF, whitespace |
Package version: 3.0.1 (set in code_normalizer_pro/__init__.py)
Developer: MR (Michael Rawls Jr.) -- Houston, TX -- GitHub: MRJR0101