Jieba

A Rustler bridge to jieba-rs, the Rust Jieba implementation.

This provides the ability to use the Jieba-rs segmenter in Elixir for segmenting Chinese text.

The API is mostly a direct mapping of the Rust API. The constructors have all been combined under one new/2 API that allows the code to feel less imperative.

The KeywordExtract functionality for both TFIDF and TextRank are also provided but due to the design of jieba-rs that restricts to project those two Rust structs into the Beam while respecting the Rust lifetime rules and ensuring mutual exclusion across threads, they are exported as single use functions that construct/tear-down the TFIDF and TextRank instances per call. This is possibly slow but fixing it to be fast would require modifying the jieba-rs API so that neither TFIDF or TextRank held a reference to the underlying jieba instance on construction and instead took the wanted instance on the extract_tags() call.

Installation

If available in Hex, the package can be installed by adding jieba to your list of dependencies in mix.exs:

def deps do
  [
    {:jieba, "~> 0.3.1"}
  ]
end

Versions prior to 0.2.0

Versions prior to 0.2.0 were written by mjason (lmj on hex and released from the mjason/jieba_ex source tree. It exposed a single Jieba.cut(sentence) method will used a single, unsyncrhonized, static instance of Jieba on the Rust side loaded with the default dictionary. The cut(sentence) was hardcoded to have hmm=false.

In March 2024, this codebase was written to help with the Visual Fonts project, not realizing an existing codebase was available. This codebase had a more complete exposure of the Rust API. After talking with mjason, it was decided to switch to this codebase and to increment the version number to signify the API break.

The 0.3.z versions still include Jieba.cut/1 interface, but have it marked deprecated. In 1.0.0, this API will be removed in favor of non-global-object based API.

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.github/workflows		.github/workflows
lib		lib
native/rustler_jieba		native/rustler_jieba
test		test
.formatter.exs		.formatter.exs
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
mix.exs		mix.exs
mix.lock		mix.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Jieba

Installation

Versions prior to 0.2.0

About

Releases 2

Packages

Contributors 2

Languages

License

awong-dev/jieba

Folders and files

Latest commit

History

Repository files navigation

Jieba

Installation

Versions prior to 0.2.0

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 2

Languages

Packages