Skip to content

VisualText/nlp-engine-windows

Repository files navigation

NLP Engine for Windows

Prebuilt Windows binaries of the VisualText NLP Engine, packaged together with the data/ knowledge bases and a Python wrapper so the engine can be downloaded and used out of the box — no compilation required.

The NLP Engine is the runtime for NLP++, a domain-specific language for natural-language analyzers. This repository tracks upstream releases automatically and republishes them as a Windows-ready bundle.

Companion Repositories

The NLP Engine is distributed per platform. Pick the one that matches your OS:

Platform Repository
Windows VisualText/nlp-engine-windows (this repo)
Linux VisualText/nlp-engine-linux
macOS VisualText/nlp-engine-mac
Source VisualText/nlp-engine

For production use from Python, prefer the NLPPlus Python package instead of the simple wrapper shipped here.

Repository contents

Path Description
nlp.exe The NLP Engine command-line executable (renamed from upstream nlpw.exe).
icudt*.dll, icuuc*.dll, icuin*.dll ICU runtime DLLs that nlp.exe links against. The version suffix (74, 78, …) tracks whichever ICU upstream is currently using.
data/ NLP Engine data directory containing the rfb (rules-from-builder) knowledge base. nlp.exe is invoked with this directory as -WORK.
compile-libs/ Headers (include/Api/, include/cs/) and engine static libraries (lib/{prim,kbm,consh,words,lite}.lib) used to link a compiled analyzer/KB into a .dll. Populated by the workflow from upstream's nlpengine-compile-libs.zip.
scripts/compile-analyzer.{ps1,bat} Compile the analyzer (run+kb) into <analyzer>\bin\run.dll + <analyzer>\bin\kb.dll.
python/ Git submodule pointing at VisualText/python — a thin Python wrapper (NLPEngine class) that shells out to nlp.exe.
.version-flag Records the upstream release tag currently vendored (e.g. v3.0.2). Used by the update workflow to detect stale builds.
.github/workflows/nlp-engine-build.yml Automation that pulls the latest upstream release and commits/tags it here.

Quick start

Option 1: Download a release

Grab the latest tagged release from this repository's Releases page. The tag mirrors the upstream VisualText/nlp-engine version (e.g. v3.1.9).

Option 2: Clone the repository

git clone --recurse-submodules https://github.com/VisualText/nlp-engine-windows.git
cd nlp-engine-windows

The --recurse-submodules flag pulls in the python/ submodule. If you forgot it, run:

git submodule update --init --recursive

Running nlp.exe

nlp.exe is invoked with an analyzer folder, a working directory containing the data/ tree, and an input text file:

.\nlp.exe -ANA <path-to-analyzer-folder> -WORK <path-to-this-repo> <path-to-input-text-file>

Add -DEV to enable developer logging output.

nlp.exe and the ICU DLLs must remain in the same folder — Windows resolves the DLLs from the executable's directory at load time.

Using from Python

The bundled python/ submodule contains an NLPEngine class that wraps the executable for scripting use. See python/README.md for details. Minimal example:

from python.nlpengine import NLPEngine

engine = NLPEngine(
    engineDir=r"C:\path\to\nlp-engine-windows",
    analyzersDir=r"C:\path\to\analyzers",
)
engine.analyzeFile("my-analyzer", "sample.txt")

For production workloads, prefer the NLPPlus Python package, which links against the engine directly instead of shelling out.

Compiling an analyzer to native DLLs

By default nlp.exe runs analyzers fully interpreted from the .nlp source. With the engine's -COMPILED mode, both the analyzer body (the rule passes) and the knowledge base are compiled to native DLLs that the engine LoadLibrarys at runtime — the analyzer runs entirely from compiled code, so source edits to .nlp files between runs don't affect the output until you recompile.

Script What it does Output
scripts/compile-analyzer.ps1 Runs nlp.exe -COMPILE (emits the analyzer C++ trees under <analyzer>\run and <analyzer>\kb), then links everything into a single DLL against compile-libs/. The DLL exports both run_analyzer(Parse*) and kb_setup(void*) (engine codegen emits both). <analyzer>\bin\run.dll
<analyzer>\bin\runu.dll
<analyzer>\bin\kb.dll
<analyzer>\bin\kbu.dll

The same DLL is staged under all four filenames so the engine's load paths find it whether it's looking for the ANSI or UNICODE build flavour (lite/nlp.cpp:1242 / cs/libconsh/cg.cpp:168).

A .bat shim of the same name ships alongside the .ps1, so the script can be run either directly from PowerShell or from cmd.exe.

Prerequisites

  • Visual Studio 2022 (or Build Tools) with the Desktop development with C++ workload — provides cl.exe, lib.exe, dumpbin.exe, and VsDevCmd.bat. The script locates these automatically via vswhere.exe.
  • CMake ≥ 3.16 (on PATH).

Usage

# Default: full-analyzer compile (run + kb):
.\scripts\compile-analyzer.ps1 data\rfb data\rfb\input\text.txt

# Or from cmd.exe via the .bat shim:
scripts\compile-analyzer.bat data\rfb data\rfb\input\text.txt

# Legacy: KB-only compile (matches the pre-NLP-ENGINE-WINDOWS-007 behaviour):
.\scripts\compile-analyzer.ps1 -KbOnly data\rfb data\rfb\input\text.txt

# Run with the compiled artifacts:
.\nlp.exe -COMPILED -ANA data\rfb -WORK . data\rfb\input\text.txt

What you should see in the -COMPILED output for a successful round-trip:

[CG: Trying to load compiled KB.]
[Loading compiled kb: data\rfb\bin\kb.dll]
[Loaded compiled kb library]
[Loading compiled analyzer data\rfb\bin\run.dll]
[Loaded compiled analyzer]
... parse output ...

How ICU is linked

Upstream only ships ICU as runtime DLLs (icudt78.dll, icuin78.dll, icuuc78.dll) — no .lib import libraries. On its first run, compile-analyzer.ps1 generates the import libs from the DLLs:

  1. dumpbin /exports icu*.dll lists every exported symbol.
  2. The exports are written to a .def file under compile-libs\lib\.
  3. lib.exe /def:... /machine:X64 /out:icu*.lib produces the import library.

Subsequent runs reuse the generated .lib files. The ICU version digits (78) match the bundled DLLs and will move in lock-step whenever upstream bumps ICU.

How releases are produced

The nlp-engine-build.yml workflow keeps this repo in sync with upstream:

  1. Queries VisualText/nlp-engine for the latest GitHub release.
  2. Downloads the Windows assets — nlpengine.zip (the data/ tree), nlpw.exe, the three ICU DLLs (icudt*.dll, icuuc*.dll, icuin*.dll), and nlpengine-compile-libs.zip (headers + engine static libraries used by the compile scripts; optional — skipped if absent on a given release). Asset matching is version-agnostic, so ICU bumps don't require workflow edits.
  3. Removes the previously committed binaries in a dedicated cleanup commit (so git stores a clean diff rather than two layered binary blobs).
  4. Unzips nlpengine.zip, renames nlpw.exenlp.exe, extracts nlpengine-compile-libs.zip to compile-libs/, and commits the new files.
  5. Tags the commit with the upstream version and publishes a GitHub release.

Triggers

  • repository_dispatch with event type nlp-engine-release — fired by the upstream repo on every new release.
  • workflow_dispatch — manual trigger from the Actions tab. Forces an update even if the tag already exists locally.

When the workflow fails

If actions/github-script throws Could not find <asset> in release …, an upstream asset has been renamed. The error message includes the full list of available assets — update the matcher in nlp-engine-build.yml (the findAsset calls) to reflect the new name.

Related repositories

License

MIT — matches the upstream NLP Engine license.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors