# 词性标注介绍

**百度百科定义**：词性指以词的特点作为划分词类的根据。词类是一个语言学术语，是一种语言中词的语法分类，是以语法特征（包括句法功能和形态变化）为主要依据、兼顾词汇意义对词进行划分的结果。

**维基百科定义**：In traditional grammar, a part of speech (abbreviated form: PoS or POS) is a category of words (or, more generally, of lexical items) which have similar grammatical properties.

常见的标注方法有：
*   基于统计模型的词性标注方法
*   基于统计方法与规则方法相结合的词性标注方法
*   基于深度学习的词性标注方法

# 测试几款词性识别工具

## 结巴 --https://github.com/fxsjy/jieba

In [0]:
import jieba.posseg

In [0]:
posseg_list = jieba.posseg.cut("我是褚安康，我爱自然语言处理。")
print(" ".join("%s/%s" % (word, tag) for (word, tag) in posseg_list))   

Building prefix dict from the default dictionary ...
Dumping model to file cache /tmp/jieba.cache
Loading model cost 0.955 seconds.
Prefix dict has been built succesfully.


我/r 是/v 褚/nr 安康/nr ，/x 我/r 爱/v 自然语言/l 处理/v 。/x


## SnowNLP --https://github.com/isnowfy/snownlp

In [0]:
! pip install snownlp

Collecting snownlp
[?25l  Downloading https://files.pythonhosted.org/packages/3d/b3/37567686662100d3bce62d3b0f2adec18ab4b9ff2b61abd7a61c39343c1d/snownlp-0.12.3.tar.gz (37.6MB)
[K     |████████████████████████████████| 37.6MB 109kB/s 
[?25hBuilding wheels for collected packages: snownlp
  Building wheel for snownlp (setup.py) ... [?25l[?25hdone
  Created wheel for snownlp: filename=snownlp-0.12.3-cp36-none-any.whl size=37760958 sha256=1cae11448112ca8845a48a5626fccd6482ae9fa4302926f6e0816d24db1cea19
  Stored in directory: /root/.cache/pip/wheels/f3/81/25/7c197493bd7daf177016f1a951c5c3a53b1c7e9339fd11ec8f
Successfully built snownlp
Installing collected packages: snownlp
Successfully installed snownlp-0.12.3


In [0]:
from snownlp import SnowNLP                                                   

In [0]:
snow_result = SnowNLP("我是褚安康，我爱自然语言处理。")                                     
print(" ".join("%s/%s" % (word, tag) for (word, tag) in snow_result.tags)) 

我/r 是/v 褚/nr 安康/a ，/w 我/r 爱/v 自然/n 语言/n 处理/vn 。/w


## pkuseg --https://github.com/lancopku/pkuseg-python

In [0]:
! pip install pkuseg

Collecting pkuseg
[?25l  Downloading https://files.pythonhosted.org/packages/36/d8/2cd2d21fc960815d4bb521e1e2e2f725c0e4d1ab88cefa4c73520cd84825/pkuseg-0.0.22-cp36-cp36m-manylinux1_x86_64.whl (50.2MB)
[K     |████████████████████████████████| 50.2MB 80kB/s 
Installing collected packages: pkuseg
Successfully installed pkuseg-0.0.22


In [0]:
import pkuseg

In [0]:
pku_seg = pkuseg.pkuseg(postag=True)
pku_results = pku_seg.cut("我是褚安康，我爱自然语言处理。")
print(" ".join("%s/%s" % (word, tag) for (word, tag) in pku_results))  

我/r 是/v 褚/n 安康/a ，/w 我/r 爱/v 自然/n 语言/n 处理/v 。/w


## THULAC --https://github.com/thunlp/THULAC-Python

In [0]:
! pip install thulac

Collecting thulac
[?25l  Downloading https://files.pythonhosted.org/packages/98/f2/f5893d06e744fe228f06ea1f340c90d15f55b0e3b0148762ab234af4573c/thulac-0.2.1.tar.gz (52.9MB)
[K     |████████████████████████████████| 52.9MB 89kB/s 
[?25hBuilding wheels for collected packages: thulac
  Building wheel for thulac (setup.py) ... [?25l[?25hdone
  Created wheel for thulac: filename=thulac-0.2.1-cp36-none-any.whl size=53141669 sha256=1a363a2b9382d4758a196b98b1a72cb5501210fec6fbe10b5e6f287fd3f2804a
  Stored in directory: /root/.cache/pip/wheels/db/36/4a/1ac1e9b9ce727a9dfc7fa20092992707d7da162df871c8488f
Successfully built thulac
Installing collected packages: thulac
Successfully installed thulac-0.2.1


In [0]:
import thulac

In [0]:
thup = thulac.thulac()
thulac_result = thup.cut("我是褚安康，我爱自然语言处理。")                                          
print(" ".join("%s/%s" % (word, tag) for (word, tag) in thulac_result))

Model loaded succeed
我/r 是/v 褚安康/np ，/w 我/r 爱/v 自然/n 语言/n 处理/v 。/w


## pyhanlp --https://github.com/hankcs/pyhanlp

In [0]:
! pip install pyhanlp



In [0]:
from pyhanlp import HanLP

下载 http://hanlp.com/static/release/hanlp-1.7.5-release.zip 到 /usr/local/lib/python3.6/dist-packages/pyhanlp/static/hanlp-1.7.5-release.zip
100.00%, 1 MB, 308 KB/s, 还有 0 分  0 秒   
下载 https://file.hankcs.com/hanlp/data-for-1.7.5.zip 到 /usr/local/lib/python3.6/dist-packages/pyhanlp/static/data-for-1.7.5.zip
100.00%, 637 MB, 7881 KB/s, 还有 0 分  0 秒   
解压 data.zip...


In [0]:
hanlp_result = HanLP.segment("我是褚安康，我爱自然语言处理。")
print(" ".join("%s/%s" % (term.word, term.nature) for term in hanlp_result))

我/rr 是/vshi 褚/nr 安康/an ，/w 我/rr 爱/v 自然语言处理/nz 。/w


## FoolNLTK --https://github.com/rockyzhengwu/FoolNLTK

In [0]:
! pip install foolnltk

Collecting foolnltk
[?25l  Downloading https://files.pythonhosted.org/packages/76/0a/55ffd34458a8b6bd38c2591b712e56acbaf69c3489a3dfaae3475e0dc2c4/foolnltk-0.1.6.tar.gz (60.8MB)
[K     |████████████████████████████████| 60.8MB 66kB/s 
Building wheels for collected packages: foolnltk
  Building wheel for foolnltk (setup.py) ... [?25l[?25hdone
  Created wheel for foolnltk: filename=foolnltk-0.1.6-cp36-none-any.whl size=60814751 sha256=bc00840c55d3a1d476106026dacfd044ab0c4eb64053a79418601aa4b76a8376
  Stored in directory: /root/.cache/pip/wheels/41/42/ab/c318b90c3959fc8da0909801939e636da62785ff8b44eadeef
Successfully built foolnltk
Installing collected packages: foolnltk
Successfully installed foolnltk-0.1.6


In [0]:
import tensorflow
import fool

In [0]:
# fool_result = fool.pos_cut("我是褚安康，我爱自然语言处理。")
# print(" ".join("%s/%s" % (word, tag) for (word, tag) in fool_result[0]))

## LTP --https://github.com/HIT-SCIR/ltp pyltp--https://github.com/HIT-SCIR/pyltp


In [0]:
! pip install pyltp

Collecting pyltp
[?25l  Downloading https://files.pythonhosted.org/packages/aa/72/2d88c54618cf4d8916832950374a6f265e12289fa9870aeb340800a28a62/pyltp-0.2.1.tar.gz (5.3MB)
[K     |████████████████████████████████| 5.3MB 9.2MB/s 
[?25hBuilding wheels for collected packages: pyltp
  Building wheel for pyltp (setup.py) ... [?25l[?25hdone
  Created wheel for pyltp: filename=pyltp-0.2.1-cp36-cp36m-linux_x86_64.whl size=32016155 sha256=06ee3fe30858fa922be93d26c7d6832104d0b43da13db3cf4cff016356a71448
  Stored in directory: /root/.cache/pip/wheels/fc/3a/35/b11293efb2c77c0e7b6fa574271d51cddd9abd1f634535343c
Successfully built pyltp
Installing collected packages: pyltp
Successfully installed pyltp-0.2.1


## Stanford CoreNLP --https://stanfordnlp.github.io/CoreNLP/ stanfordcorenlp -- https://github.com/Lynten/stanford-corenlp

In [0]:
! pip install stanfordcorenlp

Collecting stanfordcorenlp
  Downloading https://files.pythonhosted.org/packages/35/cb/0a271890bbe3a77fc1aca2bc3a58b14e11799ea77cb5f7d6fb0a8b4c46fa/stanfordcorenlp-3.9.1.1-py2.py3-none-any.whl
Installing collected packages: stanfordcorenlp
Successfully installed stanfordcorenlp-3.9.1.1
