Skip to content

GhostRootSec/langdetect

langdetect

Content-based programming language detector CLI.

langdetect identifies a file's language by analyzing syntax patterns in the file content, not by filename extension.

Features

  • Detects 50+ programming languages
  • Uses weighted syntax fingerprints
  • Reports best match plus alternate likely matches
  • Works even when file extensions are wrong or missing

Install

Local editable install

pip install -e .

Usage

langdetect path/to/file
langdetect path/to/file --verbose

Example output:

Best match    : Scala (88.6%)
Also possible : Nim (23% rel.)  |  Kotlin (22% rel.)

How It Works

  • The detector scores each language against regex-based fingerprints.
  • The highest score is reported as the best match.
  • Confidence is based on score separation between the top and runner-up languages.

Roadmap

  • Add fixture-driven accuracy tests
  • Add optional JSON output mode
  • Add plugin support for custom language fingerprints

Contributing

Please read CONTRIBUTING.md before opening a pull request.

Security

To report vulnerabilities, see SECURITY.md.

License

MIT License. See LICENSE.

About

Content-based programming language detector CLI

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages