Skip to content

Latest commit

 

History

History
171 lines (114 loc) · 6.73 KB

README_EN.md

File metadata and controls

171 lines (114 loc) · 6.73 KB

Build Status Financial Contributors on Open Collective Author Donate Platform Performance License NpmDownload Status NPM Version Code Climate Coverage Status


NodeJieba 简体中文

logo

Introduction

NodeJieba provides chinese word segmentation for Node.js based on CppJieba.

Install

npm install nodejieba

Or with npmmirror.com:

npm install nodejieba --registry=https://registry.npmmirror.com --nodejieba_binary_host_mirror=https://registry.npmmirror.com/-/binary/nodejieba/

Usage

import { cut } from "nodejieba";

const result = cut("南京市长江大桥");
console.log(result);
//["南京市","长江大桥"]

See details in test cases

Initialization

Initialization is optional and will be executed once cut is called with the default dictionaries.

Loading the default dictionaries can be called explicitly by

import { load } from "nodejieba";

load();

If a dictionary parameter is missing, its default value will be uesd.

Dictionary description

  • dict: the main dictionary with weight and lexical tags, it's recommended to use the default dictionary
  • hmmDict: hidden markov model, it's recommended to use the default dictionary
  • userDict: user dictionary, it's recommended to modify it to your use case
  • idfDict: idf information for keyword extraction
  • stopWordDict: list of stop words for keyword extraction

POS Tagging

import { tag } from "nodejieba";

console.log(tag("红掌拨清波"));
//[ { word: '红掌', tag: 'n' },
//  { word: '拨', tag: 'v' },
//  { word: '清波', tag: 'n' } ]

See details in test cases

Keyword Extractor

import { extract, textRankExtract } from "nodejieba";

const topN = 4;

console.log(extract("升职加薪,当上CEO,走上人生巅峰。", topN));
//[ { word: 'CEO', weight: 11.739204307083542 },
//  { word: '升职', weight: 10.8561552143 },
//  { word: '加薪', weight: 10.642581114 },
//  { word: '巅峰', weight: 9.49395840471 } ]

console.log(textRankExtract("升职加薪,当上CEO,走上人生巅峰。", topN));
//[ { word: '当上', weight: 1 },
//  { word: '不用', weight: 0.9898479330698993 },
//  { word: '多久', weight: 0.9851260595435759 },
//  { word: '加薪', weight: 0.9830464899847804 },
//  { word: '升职', weight: 0.9802777682279076 } ]

See details in test cases

Node.js Support

  • v16
  • v18
  • v20

Use Cases

Similar projects

Performance

It is supposed to have the best performance out of all available Node.js modules. There is a post available in mandarin [Jieba 中文分词系列性能评测].

Online Demo

http://cppjieba-webdemo.herokuapp.com/ (chrome is suggested)

Contact

Email: i@yanyiwu.com

Author

Contributors

Code Contributors

This project exists thanks to all the people who contribute.

Financial Contributors

Become a financial contributor and help us sustain our community. [Contribute]

Individuals

Organizations

Support this project with your organization. Your logo will show up here with a link to your website. Contribute