Skip to content

LdBeth/emt

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

emt.el

Introduction

EMT stands for Emacs MacOS Tokenizer.

This package use macOS’s built-in NLP tokenizer to tokenize and operate on CJK words in Emacs.

Installation

Requirements

  • macOS 10.15 or later
  • Emacs 26.1 or later, built with dynamic module support (use --with-modules during compilation)

Build dynamic module

Pre-built (recommendation)

Retrieve the pre-built module from the releases section and place the dylib file in the emacs-macos-tokenizer-lib-path (by default, it is located at modules/libEMT.dylib within your personal configuration folder, normally ~/.emacs.d/modules/libEMT.dylib).

Manually build

  • Install Xcode.
  • Build the module using emt-compile-module, which compiles and copies the module to emt-lib-path.

If you enconter the folloing error:

No such module “PackageDescription”

run the following command and try again:

sudo xcode-select --switch /Applications/Xcode.app/Contents/Developer

Install package

Install with straight and use-package:

(use-package emt
  :straight (:host github :repo "roife/emt")
  :hook (after-init . emt-mode))

Customization

emt-use-cache

Caches for results of tokenization if non-nil. Default is t.

emt-cache-lru-size

The size of LRU cache. Default is 50.

emt-lib-path

The path to the directory of dynamic library for emt. Default is ~/.emacs.d/modules/libEMT.dylib.

Usage

keymap: emt-mode-map

It remaps forward-word, backward-word, kill-word and backward-kill-word to use emt’s version.

Minor mode

It calls emt-ensure, which load dynamic modeuls and set emt-mode-map.

Functions

emt-word-at-point-or-forward

Return the word at point. If current point is at bound of a word, return the one forward.

emt-word-at-point-or-backward

Return the word at point. If current point is at bound of a word, return the one backward.

emt-compiler-module

Compile and copy the module to emt-lib-path.

It takes an optional argument path, which is the path to the directory of dynamic library. By default, path is set to emt-lib-path.

emt-ensure

Load dynamic module.

emt-split

Split string into a list of words.

Return a list of cons, each of which has a word and its bound (a cons of the beginning position and the ending position of the word).

emt-split-without-bounds

Split string into a list of words. Just return a list of word.

It is faster than emt-split.

emt-forward-word

CJK compatible version of forward-word.

emt-backward-word

CJK compatible version of backward-word.

emt-kill-word

CJK compatible version of kill-word.

emt-backward-kill-word

CJK compatible version of backward-kill-word.

emt-mark-word

CJK compatible version of mark-word.

Acknowledgements

This package is inspired by jieba.el which is a Chinese tokenizer for Emacs using jieba.

The dynamic module uses emacs-swift-module, which provides an interface for writing Emacs dynamic modules in Swift.

About

Emacs macOS Tokenizer, tokenizing CJK words with macOS's built-in NLP tokenizer.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C 58.8%
  • Emacs Lisp 23.6%
  • Objective-C 9.9%
  • Swift 7.4%
  • Makefile 0.3%