Skip to content

A lightweight Java library for automatic code language detection of an input text.

License

Notifications You must be signed in to change notification settings

Valkryst/VCodeLanguageDetection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

71 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Java CI with Maven CodeQL

This is a lightweight library that allows users to automatically detect the coding language of an input text.

The current implementation is far from perfect, and will sometimes misidentify the language of a code snippet. Longer code snippets are more likely to be correctly identified.

Table of Contents

Installation

VCodeLanguageDetection is hosted on the JitPack package repository which supports Gradle, Maven, and sbt.

Gradle Gradle

Add JitPack to your build.gradle at the end of repositories.

allprojects {
	repositories {
		...
		maven { url 'https://jitpack.io' }
	}
}

Add VCodeLanguageDetection as a dependency.

dependencies {
	implementation 'com.github.Valkryst:VCodeLanguageDetection:1.0.0'
}

Maven Maven

Add JitPack as a repository.

<repositories>
    <repository>
        <id>jitpack.io</id>
        <url>https://jitpack.io</url>
    </repository>
</repositories>

Add VCodeLanguageDetection as a dependency.

<dependency>
    <groupId>com.github.Valkryst</groupId>
    <artifactId>VCodeLanguageDetection</artifactId>
    <version>1.0.0</version>
</dependency>

Scala SBT Scala SBT

Add JitPack as a resolver.

resolvers += "jitpack" at "https://jitpack.io"

Add VCodeLanguageDetection as a dependency.

libraryDependencies += "com.github.Valkryst" % "VCodeLanguageDetection" % "1.0.0"

Usage

Get an instance of LanguageDetector, and then call the detectLanguage method with the code whose language you wish to detect. You can then use .entrySet().iterator().next() to retrieve the first entry in the map, which will be the most likely language.

public class Example {
  public static void main(final String[] args) {
    final var code = """
    public class Example {
        public static void main(final String[] args) {
            System.out.println("Hello, World!");
        }
    }
    """;

    final var detector = LanguageDetector.getInstance();
    final var language = detector.detect(code);

    System.out.println("Probabilities:");
    for (final var entry : language.entrySet()) {
      System.out.println("\t" + entry.getKey() + ": " + entry.getValue());
    }
  }
}

If you're using RSyntaxTextArea, you can use the following method to detect the language and return the appropriate syntax style.

private String detectSyntaxStyle(final @NonNull String code) {
  final var languages = LanguageDetector.getInstance().detect(code);
  final var entry = languages.entrySet().iterator().next();

  if (entry.getValue() == 0) {
    return RSyntaxTextArea.SYNTAX_STYLE_NONE;
  }

  try {
    final var styleName = "SYNTAX_STYLE_" + entry.getKey().toUpperCase();
    final var field = SyntaxConstants.class.getDeclaredField(styleName);
    return (String) field.get(null);
  } catch (final NoSuchFieldException | IllegalAccessException e) {
    return RSyntaxTextArea.SYNTAX_STYLE_NONE;
  }
}

Supported Languages

  • C
  • C++
  • C#
  • Clojure
  • D
  • Dart
  • Delphi
  • Fortran
  • Go
  • Java
  • JavaScript
  • Lua
  • Perl
  • PHP
  • Python
  • Ruby
  • SQL

Credits & Inspiration

  • highlight.js - For the lists of keywords for each language.