Add experimental Jieba segmentation plugin and standalone packaging support#1093
Add experimental Jieba segmentation plugin and standalone packaging support#1093frankslin merged 20 commits intoBYVoid:masterfrom
Conversation
…plugin - Refactored C++ calls to use the modern cppjieba::Jieba interface. - Updated CMakeLists and BUILD.bazel to point to local plugin dependencies. - Added portable_util export to Bazel test framework to support cross-package Jieba plugin integration tests.
…lone project - Adapted plugins/jieba/CMakeLists.txt to dynamically detect CMAKE_CURRENT_SOURCE_DIR. - Resolves upstream OpenCC C++ Headers dynamically via find_package(opencc) during standalone builds. - Guards internal component integrations (like test cases) when OpenCC root tree isn't present.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4e0c9996fa
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 298e2bbc69
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…ndows Remove the logic attempting to remove "msys-" prefix in mingw build config.
There was a problem hiding this comment.
💡 Codex Review
OpenCC/plugins/jieba/src/JiebaSegmentationPlugin.cpp
Lines 53 to 55 in 05fb781
Jieba resource existence checks are routed through IsReadableFile, which uses narrow std::ifstream(path.c_str()); on Windows this fails for UTF-8/non-ASCII paths even when files exist. In those environments the plugin incorrectly returns OPENCC_ERROR_PLUGIN_RESOURCE_MISSING for dict_path/model_path/aux files, breaking conversions for valid installations under localized directory names.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Also update documentation.
There was a problem hiding this comment.
💡 Codex Review
OpenCC/plugins/jieba/src/JiebaSegmentationPlugin.cpp
Lines 198 to 202 in 413535f
CreateJiebaSegmentation returns failure when required resources are missing, but these early-return paths never reset *args->out to null. The plugin ABI contract in OpenCCPlugin.h says create() failures must leave *out == NULL; otherwise callers that reuse a non-null handle variable can keep a stale pointer and may later destroy or reuse an invalid handle after initialization failed.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8f6a2c682e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5fa48b5904
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 450d480944
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 301c7229c4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3d42994ebd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| #ifdef _WIN32 | ||
| "_main/plugins/jieba/opencc-jieba.dll", | ||
| "_main/plugins/jieba/opencc-jieba.exe", | ||
| #elif defined(__APPLE__) |
There was a problem hiding this comment.
Include all Windows DLL variants in Bazel plugin lookup
In PluginDirectory() the Bazel/Windows candidate list only checks opencc-jieba.dll and opencc-jieba.exe, but the host loader also accepts libopencc-<type>.dll and msys-opencc-<type>.dll (see GetPluginFileNames in src/PluginSegmentation.cpp). When Bazel emits one of those alternate DLL names (for example with MinGW-flavored toolchains), this test path resolves to "", sets OPENCC_SEGMENTATION_PLUGIN_PATH to empty, and fails with a false plugin-not-found error even though the plugin was built.
Useful? React with 👍 / 👎.
|
In the PR description please also briefly explain how the plugin is loaded into opencc (statically linked, dynamically linked or dynamically loaded). |
|
|
||
| PREFIX = /usr | ||
| REL_BUILD_DOCUMENTATION ?= ON | ||
| REL_BUILD_DOCUMENTATION ?= OFF |
There was a problem hiding this comment.
This was needed to resolve a CI failure due to a missing doxygen deps in the GitHub Action runtime. I forgot how it was triggered, but it didn't look like it always repro on all platform combinations. If generating doc is still needed, we can run it one-time in the release workflow and only in one architecture.
| ) | ||
|
|
||
| cc_binary( | ||
| name = "opencc-jieba", |
There was a problem hiding this comment.
Is this binary target a library to be dynamically linked to opencc? If so, I would prefer a more explicit name like libopencc-jieba
There was a problem hiding this comment.
The "binary" target outputs a platform dependent dynamic library following each platform's naming convention:
- Linux:
libopencc-jieba.so - macOS:
libopencc-jieba.dylib - Windows:
opencc-jieba.dll(ormsys-opencc-jieba.dllfor Mingw/msys)
Using a name like "lib..." could imply it's for a cc_library rule (which it isn't). And on Windows, the naming convention is to drop the "lib" prefix.
Summary
This PR adds the first loadable segmentation plugin for OpenCC and wires in an experimental
jieba-based segmenter.It includes:
plugins/jiebaimplementation backed bycppjiebas2twpandtw2spThe plugin is dynamically loaded during
openccexecution.What changed
Core plugin host (in earlier commit)
segmentation.typemmsegJieba plugin
cppjiebaunderplugins/jieba/deps, including required resources (in earlier commit)opencc-jiebashared library targets2twp_jieba.jsontw2sp_jieba.jsonBuild and packaging
plugins/jiebaplugins/jiebaas a standalone CMake project viafind_package(OpenCC)Tests
opencccommandopencc-jiebaplugin慰藉著城堡的士兵Notes
jiebaremains experimental and is exposed through plugin-specific configs instead of replacing built-in segmentation.opencc-jiebaseparately.